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RESEARCH  OVERVIEW 


This  report  covers  the  period  from  April  1,  1983  through  September  30, 
1983.  The  research  discussed  here  is  described  in  more  detail  In  several 
published  and  unpublished  reports  cited  below. 

f  ^ 

— ^"A  better  understanding  of  the  limiting  mechanisms  in  heterojunction 
transistors  Is  leading  to  Improved  devices.  Installation  of  an  HBE  system 
will  allow  production  of  the  first  samples  in  December  1983. 

Bipolar  transistors  have  been  fabricated  on  recrystallized  silicon  films 
for  the  first  time.  These  are  designed  to  study  the  properties  of  the  film. 

A  fully  self- aligned  JCMOS  device  has  been  fabricated  with  partial  success. 
Another  three-dimensional  device  structure,  the  staggered  CMOS  device,  has 
been  plagued  by  leaks  In  the  oxide  under  the  recrystallized  layer. 

A  high-performance  FIR  filter  has  been  designed  at  the  logic  level,  as  a 
test  bed  for  retiming  and  size  optimization  algorithms.  In  a  related  effort, 
progress  has  been  made  on  a  technique  for  automatically  testing  adherence  to  a 
design  methodology.  Among  the  things  checked  are  threshold  drop  limits, 
pullup  network  topology.  Information  sources  and  sinks,  charge-sharing  faults, 
and  races . 

Several  Improvements  to  the  PI  placement  and  Interconnect  program  have 
been  made.  Including  automatic  power/ground  routing.  The  program  will  receive 
study  at  an  Industrial  test  site.  About  a  dozen  systolic- array 
transformations  have  been  Identified  as  potentially  Important  to  designers. 
One,  retiming,  has  been  shown  to  be  computationally  feasible.  Transformations 
between  two-dimensional  and  three-dimensional  Interconnect  structures  have 
been  derived . 

After  a  disappointing  production  run  of  SCHEME-81  chips,  with  an 
abnormally  high  diffusion  resistance,  that  yielded  no  working  chips,  a  second 
production  run  produced  some  chips  that  appear  to  work  almost  completely. 
Reliable  sequencing,  conditional  branching,  dispatching,  and  memory  references 
have  been  observed.  These  chips  appear  to  be  fast,  supporting  a  clock  rate  of 
1  MHz.  f&ich  of  the  user  Interface  of  the  Schema  system  has  been  Implemented, 
along  with  the  basis  for  the  DC  analysis.  The  underlying  data  base  appears  to 
be  reliable. 

A  list  of  some  of  the  published  and  unpublished  papers  appears  after  the 
detailed  descriptions  of  the  various  projects.  Some  of  these  are  reprinted 
with  this  report. 
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HETEROSTRUCTURE-LOGIC  TECHNOLOGY 


Fabrication  of  triply  implanted  hetero junction  npn  bipolar  transistors 
has  continued  with  some  improvements  in  performance  seen  but  gains  remaining 
low.  In  particular,  the  excessive  base-collector  junction  leakage,  which  has 
dominated  the  characteristics  of  all  earlier  devices,  was  eliminated  in  the 
latest  devices  by  removing  the  sputtered  SiO,  dielectric  film  covering  the 
devices  and  etching  "400  A  off  the  exposed  (ln,Ga)As  surfaces.  At  the  same 
time,  however,  the  gain  was  decreased  to  approximately  0.2.  This  severe 
degradation  can  only  be  explained  by  increased  lateral  emitter-base  injection 
which  results  in  a  large  decrease  in  emitter  efficiency.  An  attempt  will  be 
made  to  confirm  this  explanation  by  etching  a  mesa  around  these  devices.  If 
this  proves  to  be  the  correct  explanation,  an  n+-collar  will  be  tried,  to 
defeat  this  lateral  Injection  while  maintaining  a  planar  structure.  The 
elimination  of  the  collector- base  leakage  is  a  very  significant  and  important 
advance  because  It  shows  that  the  damage  In  this  doubly  Implanted  junction 
(n-implant  Into  p-implant)  is  low  enough  to  give  low  leakage  diodes.  It 
remains  to  be  seen  if  the  minority  carrier  diffusion  length  in  the  implanted 
base  will  be  long  enough  to  give  the  desired  gains  of  10  to  20. 

Little  progress  has  been  made  on  Improving  the  gain  of  the  lateral  pnp 
transistors,  first  reported  about  a  year  ago.  We  understand  the  Importance  of 
the  surface  better  now,  however,  and  feel  it  plays  a  crucial  role  In 
determining  the  gain  of  these  devices.  We  are  presently  fabricating  lateral 
pnp  transistors  having  a  field  plate  (MIS  gate)  over  the  base  region.  This 
electrode  will  allow  control  of  the  surface  potential,  and  thereby  the  surface 
recombination  velocity  at  the  exposed  base  region  surface.  This  should  allow 
both  a  quantification  of  the  role  of  surface  recombination  on  the  lateral 
transistor  gain  and  control  of  this  loss  mechanism.  If  large  surface 
recombination  is  the  cause  for  the  low  gain  (~4),  a  shallow  n'*'  implant  may 
solve  the  problem  as  it  will  tend  to  shield  the  minority  carrier  holes  from 
the  surface. 

During  this  report  period  fabrication  of  grown  junction  npn  transistors 
has  started.  These  devices  are  made  by  first  growing  an  epitaxial  n-InP, 
p'*"-(In,Ga)As ,  n-(In,Ga)As  heterostructure  on  an  n^-InP  substrate.  Liquid 
phase  epitaxy  Is  being  used  to  grow  the  layers  and  manganese  Is  being  used  as 
the  base  dopant.  A  multiple  energy  Be  Implantation  Is  done  to  provide  access 
to  the  p*^  base,  ohmic  contacts  are  applied,  and  mesas  are  etched  to  separate 
the  devices.  With  heavily  Mn-doped  bases,  low  gains  are  seen  on  both  the 
three- terminal  heterojunction  transistors  just  described  and  on 
phototransistors,  devices  lacking  the  base  access  implant.  Recent  devices 
grown  from  melts  containing  less  Mn,  show  significantly  better  phototransistor 
gain  (estimated  at  "100)  but  the  HJBT  gain  Improves  by  very  much  less  (still 
<  5).  Present  work  directed  at  understanding  this  discrepancy  Is  focusing  on 
the  bottom  Implanted  emitter-base  junction.  This  wide  band  gap  homojunction 
is  supposed  to  be  blocking  rather  then  conducting  when  in  parallel  with  the 
E-B  heterojunction;  an  attempt  will  be  made  to  demonstrate  whether  this  is  or 
is  not  the  case,  and  if  it  is  not,  to  understand  how  to  make  it  function  as  it 
must.  This  point  also  affects  the  lateral  pnp  transistors  and  is  a  critical 
Issue. 


The  work  on  ion  implantation,  particularly  on  improving  the  effectiveness 


6 


of  the  post-implant  anneal,  has  led  to  continued  study  of  the  use  of  a  high 
Intensity  arc  lamp  to  do  rapid  Isothermal  anneals  of  Implanted  layers.  This 
work  began  with  GaAs ,  because  the  vast  literature  on  this  material  gives  a 
good  benchmark  with  which  to  develop  the  basic  arc  lamp  annealing  procedures. 
Next,  work  centered  on  InP  and  now,  arc  lamp  annealing  of  (In,Ga)As  (see 
publications) . 

During  this  report  period,  an  undergraduate  thesis  was  completed  (see 
publications)  In  which  the  first  measurements  of  specific  contact  resistance 
on  p-type  (ln,Ga)As  were  made.  It  was  found  that  contact  resistances  In  the 
low  10~*6  could  easily  be  obtained.  At  the  same  time  the  apparent  specific 
resistance  was  smaller  for  smaller  area  contacts.  Both  of  these  preliminary 
observations  are  very  Important  for  devices;  these  measurements  are  now  being 
repeated  and  extended. 

Finally,  the  molecular  beam  epitaxy  system  was  delivered  In  July  and 
assembled  In  September  and  October.  While  there  are  still  pieces  back-ordered 
from  the  manufacturer,  cell  back-out  procedures  can  now  be  started.  It  Is 
hoped  to  have  the  cells  baked-out  and  loaded  by  the  end  of  November  and  the 
first  GaAs  and  (Al,Ga)As  layers  grown  In  December.  The  work  will  move  on  to 
InGaAs  after  the  first  of  the  year.  Dr.  Al  Cho  of  Bell  Laboratories  has 
agreed  to  provide  several  graduate  students  training  on  (In,Ga)As  growth  on 
his  systems  In  January. 

THREE-DIMENSIONAL  DEVICES  AND  INTERCONNECTIONS 


Two  different  self-aligned  3-D  structures  have  been  pursued.  The  first 
one  Is  a  stacked  CMOS  gate  (JCMOS)  and  the  second  Is  a  staggered  CMOS  latch. 
The  first  structure  Is  primarily  applicable  in  random  logic,  while  the  second 
Is  primarily  suitable  for  statlc-RAM  applications.  The  emphasis  In  this 
research  at  this  point  Is  on  development  of  novel  structures  that  take 
advantage  of  slllcon-on-lnsulator  (SOI)  technology  rather  than  the  development 
of  the  SOI  technology  Itself. 

A  partially  successful  fabrication  of  the  JCMOS  structure  has  been 
completed.  The  key  novel  feature  Is  that  the  joint  gate  Is  self-aligned  to 
both  the  upper  and  lower  channel  regions.  Thus,  the  structure  Is  scalable  In 
lateral  dimensions.  Laser  recrystalllzatlon  of  the  topmost  layer  which  forms 
the  NMOS  channel  region  was  used.  Since  this  project  does  not  have  a  laser 
apparatus,  the  recrystallizations  were  done  at  the  Sperry  Research  Center  in 
Sudbury,  MA.  It  appears  that  laser  melting  Is  the  only  practical  technique 
for  recrystalllzatlon  of  stacked  structures  at  present.  Recrystalllzlng  the 
top  polyslllcon  layer  without  affecting  the  Integrity  of  the  two  underlying 
thin  gate  oxides  remains  a  significant  challenge.  This  step  Is  by  for  the 
most  Important  yield  limiting  step.  Working  JMOS  Inverters  have  been 
obtained,  but  the  apparent  mobility  of  the  NMOS  device  which  resides  In  the 
recrystallized  layer  Is  very  low,  of  order  10  cm^/v-sec.  This  effect  is  now 
being  Investigated,  along  with  a  modified  version  of  the  JCMOS  which  promises 
to  yield  a  more  planar  surface  than  the  current  version. 

Fully  functional  staggered  CMOS  devices  have  not  yet  been  obtained.  So 
far  these  devices  have  been  plagued  by  leaky  oxides  which  result  from  the 
thermal  shock  of  the  laser  recrystalllzatlon.  In  fact,  there  Is  little  doubt 
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that  melt-induced  recrystallization  is  not  going  to  be  a  practical  technique 
for  production  of  any  kind  of  3-D  devices.  However,  it  still  is  the  only 
usable  technique  for  our  experiments.  Certainly,  considerable  effort  must  be 
placed  on  the  development  of  a  practical  non-melting  single  crystal  SOI 
preparation  technique. 

During  this  reporting  period  bipolar  transistors  in  zone-melt 
recrystallized  filters  have  been  demonstrated  for  the  first  time.  This  effort 
although  not  in  the  mainstream  of  3-D  devices  was  undertaken  as  a  study  of  SOI 
films.  Reasonably  good  recombination  lifetimes  were  found  (30-50  ns),  but 
they  are  surprisingly  lower  then  Independently  measured  generation  lifetimes 
from  deep  depletion  capacitance  recovery.  These  latter  lifetimes  were  of 
order  of  1  microsecond.  At  this  point  it  is  not  clear  whether  this  is  a  real 
property  of  these  films  or  whether  the  recombination  lifetiite  is  lower  because 
of  the  more  involved  process  for  bipolar  devices. 

VLSI  CIRCUIT  PERFORMANCE 


Anne  Park  has  completed  the  logic  level  design  of  a  high  speed  digital 
FIR  filter  to  be  Implemented  in  3um  cMOS.  Each  chip  will  contain  several 
filter  stages.  A  typical  image  processing  application  will  require  on  the 
order  of  four  chips.  The  throughput  goal  is  several  hundred  megabytes  per 
second.  In  addition  to  the  fabrication  of  some  fancy  hardware,  this  project 
has  several  high  level  goals.  The  first  goal  is  to  try  out  several  theories 
and  tools,  developed  at  MIT,  in  the  context  of  a  high  performance  circuit 
design.  For  instance,  Leiserson's  retiming  techniques  have  been  used  to 
develop  the  initial  logic  level  diagram.  Mark  Matson's  optimizer  for  circuit 
level  design  will  also  be  used.  The  other  high  level  goal  is  to  explore 
methodologies  for  the  expeditious  design  of  high  performance  MOS  circuits. 

Isaac  Bain  has  been  making  good  progress  on  a  tool  for  methodology 
verification  of  hierarchically  described  VLSI  circuits.  The  program  is 
embedded  in  SCHEMA.  The  objective  is  the  be  able  to  specify  a  circuit 
methodology  at  the  start  of  a  design,  and  have  the  program  check  conformance 
to  that  methodology  as  the  design  progresses.  This  will  relieve  the  designer 
of  the  need  to  write  a  new  methodology  checking  program  for  each  time  the 
design  style  is  changed.  Wiring  operators  are  used  to  help  capture  the 
designer's  Intent.  The  program  implementation  is  now  showing  signs  of  life 
and  can  check  simple  circuits  created  in  a  wide  range  of  methodologies. 
Typical  checks  include  the  number  of  threshold  drops  in  a  pass  transistor 
network,  pullup  network  topology,  Vdd  to  ground  shorts.  Information  sources 
and  sinks,  multiple  pullups,  charge  sharing  faults,  and  races. 

ROUTING  AND  COMPLEXITY 


Professor  Ronald  Rlvest  has  worked  extensively  on  the  problem  of 
estimating  required  channel  densities,  on  the  average,  given  the  number  of 
nets  known  to  be  present  in  the  channel.  The  primary  motivation  for  this 
research  is  to  estimate  the  magnitude  of  the  "economies  of  scale"  enjoyed  by 
large  channels  relative  to  small  ones.  Suppose  the  "efficiency"  of  an 
Instance  of  a  channel  routing  problem  is  measured  as  the  ratio  of  the  channel 


density  of  Che  instance  to  the  number  of  nets  in  Chat  instance.  For  small 
channels,  this  ratio  is  nearly  one  on  the  average,  whereas  for  large  channels 
it  approaches  0.5. 

This  has  some  surprising  consequences  for  routing  gate  arrays  and 
standard  cell  arrays:  it  may  be  more  efficient  to  avoid  Che  usual  "divide  and 
conquer”  approach  which  tries  to  balance  the  number  of  nets  appearing  in  each 
channel,  and  Instead  use  a  strategy  of  crying  to  maximize  the  number  of  nets 
in  (say)  the  even  channels  and  minimize  the  number  of  nets  in  the  (say)  odd 
channels,  since  the  savings  on  Che  even  channels  may  more  than  compensate  for 
the  loss age  on  the  odd  channels. 

The  combinatorics  of  this  problem  are  believed  to  be  previously 
unstudied.  In  this  work  (joint  with  Chuck  Fiduccla)  a  channel  is  modeled  as  a 
set  of  2n  pins  in  a  row,  paired  up  into  n  two-pin  nets.  All  the  pins  are 
assumed  to  be  on  the  same  side  of  the  channel,  so  that  there  are  no  vertical 
conflicts.  (This  assumption  is  not  unreasonable,  given  that  channel  density 
is  being  used  as  Che  measure  of  routing  complexity.)  It  is  also  assumed  that 
each  possible  pairing  is  equally  likely  to  occur.  While  this  assumption  is 
clearly  a  bit  unrealistic  in  the  usual  context  where  some  care  has  been  paid 
to  module  placement,  it  is  a  plausible  first  approximation  that  should  be 
usable  to  help  quantify  Che  magnitude  of  the  economies  of  scale  enjoyed  by  the 
large  channels.  Preliminary  results  indicate  that  an  n-net  channel  should 
have  an  expected  channel  density  of 

n/2  +  0(n**c) 

where  c  is  a  constant  (yet  to  be  determined)  between  1/3  and  2/3. 

The  PI  system  is  developing  smoothly.  The  placement  code  has  been 
completed  and  revised.  The  "min-cut"  and  "hardening"  phases  of  this  phase  run 
quite  quickly  and  give  pleasing  results.  The  special  ad-hoc  code  for  pad 
placement  is  running  smoothly,  and  does  a  good  job  of  determining  the  number 
of  sides  to  use  for  pads. 

The  PI  power/ground  code  is  being  revised  to  incorporate  a  new  algorithm 
that  runs  the  VDD  tree  Inwards  from  a  power  ring  placed  just  inside  the  ring 
of  pads.  This  is  expected  to  result  in  a  substantial  Improvement  in  the 
quality  of  the  resulting  power/ground  tree.  The  previous  iteration  of  the 
power/ground  algorithm  was  presented  at  the  Design  Automation  Conference,  June 
1983,  by  Andy  Moulton. 

The  stretchlng/reslzlng  code  is  being  substantially  revised  to  generate 
better  results.  In  particular,  two  new  features  have  been  added  to  this  code: 
secondary  constraints  and  biases.  A  secondary  constraint  is  a  constraint 
which  is  desired,  but  optional.  The  constraint-solving  code  will  violate  a 
secondary  constraint  if  necessary  to  satisfy  the  primary  constraints.  This 
has  a  number  of  uses,  among  the  most  interesting  of  which  may  be  the  use  of 
secondary  constraints  to  preserve  if  possible  the  "nice  alignment"  of  modules 
which  have  a  number  of  pins  facing  each  other  across  a  channel.  The  notion  of 
"bias”  also  has  a  number  of  uses,  and  quantifies  the  notion  of  resolving  slack 
in  a  favorable  direction.  The  node-elimination  algorithm  used  to  solve  the 
constraint  graph  manipulates  bias  in  an  interesting  way  so  that  some  global 
consideration  can  be  given  to  minimizing  wire  resistance,  etc. 
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The  new  crossing  placement  algorithm,  developed  by  Chuck  Flduccla  and 
Rlvest,  as  described  In  the  last  progress  report,  Is  being  Implemented  and 
Included  In  PI.  This  code  Is  still  be  developed;  no  experimental  results  to 
report  yet. 

The  channel  routing  algorithm  proposed  by  Bursteln,  which  uses  a 
hierarchical  approach,  is  being  Implemented.  This  code  Is  also  under 
development;  no  experimental  results  to  report  yet. 

A  large  number  of  test  examples  have  been  run  through  the  portions  of  the 
PI  system  that  are  operational.  The  results  are  generally  quite  quickly 
produced  and  of  good  quality,  although  there  Is  a  perceived  need  to  polish  the 
channel  routers  and  improve  their  overall  success  rate. 

The  PI  system  has  been  exported  to  the  GE  Research  Lab  In  Schenectady, 

NY,  where  It  Is  being  studied  for  possible  use. 

Professor  Charles  Lelserson  has  been  developing  his  Ideas  about  systolic 
and  semlsystollc  design.  He  has  Identified  over  a  dozen  transformation 
techniques  that  can  aid  In  producing  efficient  systolic  and  semlsystollc 
architectures.  He  has  started  developing  models  for  understanding  some  of  the 
phenomena.  For  example,  he  has  added  multiplexors  to  the 
reglster/comblnatlonal  logic  model  used  In  his  work  on  retiming. 

Lelserson  has  also  been  experimenting  with  designs  for  complex 
arithmetic.  With  a  student  Ray  Hirshfeld,  he  has  designed  a  fixed-point 
complex  multiplier  that  Is  more  compact  than  one  can  get  by  routing  together 
four  normal  multipliers.  Now  being  studied  Is  whether  a  three-multiplier 
version,  which  saves  area,  can  be  made  to  run  as  fast  as  a  four-multiplier 
complex  multiplier. 

Lelserson  has  started  a  software  project  with  Miller  Maley  and  some 
undergraduates  to  build  a  layout  compactor.  The  compactor  will  do 
one-dlmenslonal  compaction  with  automatic  jog  Introduction.  The  jogs  will  be 
Introduced  optimally  using  a  polynomial-time  algorithm.  The  theoretical  basis 
for  the  compactor  Is  being  developed  and  the  algorithm  Improved.  The  reason  a 
software  project  Is  necessary  Is  because  the  real-world  performance  of  the 
algorithm  would  seem  to  be  much  better  than  the  current-best  theoretical 
analysis . 

With  Sandeep  Bhatt,  Lelserson  showed  that  a  problem  of  orientating 
rectangles  In  slicing  floorplans  was  polynomial- time  solvable. 

Professor  F.  Thompson  Leighton  Is  studying  several  VLSI-related  problems. 
Including:  conversion  of  2-dlmenslonal  layouts  into  3-dlmenslonal  layouts, 
design  of  networks  for  very  fast  parallel  computation,  upper  and  lower  bounds 
for  sorting  and  packet  routing,  bounds  and  algorithms  for  channel  routing,  and 
the  problem  of  decomposing  a  graph  Into  a  small  number  of  stacks. 

In  the  area  of  3-dlmenslonal  placement  and  routing.  Profs.  Leighton  and 
Rosenberg  (Duke  University)  have  developed  algorithms  for  transforming  a 
2-dlmenslonal  layout  with  area  A  and  maximum  edge  L2  Into  an  H-level 
1-actlve-layer  layout  with  volume  V“0((A/H)logA)  and  maximum  edge  length 
L3*0(L2/H).  These  bounds  are  close  to  the  best  possible.  Moreover,  the 


10 


results  Indicate  that,  for  many  circuits,  the  added  ability  to  locate 
transistors  on  multiple  layers  (as  opposed  to  only  wires)  does  not  decrease 
the  volume  needed  to  embed  a  circuit  In  a  chip. 

In  the  area  of  parallel  computation.  Prof.  Leighton  has  developed  a 
simple  network  (the  mesh  of  trees)  that  can  solve  a  variety  of  problems 
(Including  sorting,  Fourier  transform  and  matrix  multiplication)  in  0(log  N) 
steps.  Current  research  Is  directed  towards  finding  additional  applications 
for  this  network  as  well  as  for  the  related  multidimensional  mesh  of  trees  and 
the  shuffle-exchange  graph.  Particular  effort  is  being  devoted  to  finding 
fast,  area-efficient  networks  for  packet  routing  and  sorting.  As  such 
networks  can  efficiently  simulate  an  "Ideal  computer,"  their  discovery  could 
have  Important  applications  to  the  design  of  supercomputers.  Initial  work  In 
this  area  has  also  led  to  the  discovery  of  Improved  lower  bounds  for  the 
communication  complexity  of  sorting. 

In  the  area  of  channel  routing,  Dr.  Baker  (Bell  Labs),  Sandeep  Bhatt  (a 
contract-supported  graduate  student)  and  Prof.  Leighton  have  developed  a 
linear-time  approximation  algorithm  for  Manhattan  routing.  Unlike  the  many 
heuristics  previously  discovered  for  Manhattan  routing,  the  new  algorithm  Is 
guaranteed  to  produce  a  routing  that  has  channel  width  at  most  a  constant 
factor  times  the  optimal  channel  width.  As  part  of  the  work.  Baker,  Bhatt  and 
Leighton  formalized  the  notion  of  channel  flux  which  (like  channel  density)  Is 
a  lower  bound  on  channel  width.  Initial  work  with  Dr.  Pinter  (Bell  Labs)  on 
the  related  problem  of  unrestricted  2-layer  channel  routing  suggests  that  it 
may  be  possible  to  route  any  2-polnt  net  problem  with  density  D  in  a  channel 
of  width  D+0(d2/3).  xhls  Is  nearly  a  factor  of  2  better  than  the  best 
previous  bound  and  seems  to  require  only  a  few  unit-length  vertical  wire 
overlaps  per  net. 

In  the  area  of  graph  decomposition.  Dr.  Chung  (Bell  Labs)  and  Profs. 
Leighton  and  Rosenberg  (Duke)  are  studying  methods  for  "embedding  a  graph  Into 
a  book”  so  that  the  nodes  of  the  graph  are  arranged  In  a  line  along  the  spine 
of  the  book  and  so  that  the  edges  are  drawn  without  crossing  on  the  pages  of 
the  book.  The  problem  Is  to  minimize  the  number  of  pages  needed  as  well  as 
the  size  of  the  pages.  Advances  on  this  problem  (which  Is  known  to  be 
NP-complete)  will  have  direct  application  to  a  variety  of  wire  routing 
problems  for  which  no  good  solutions  are  known.  In  one  application,  each  page 
of  the  book  corresponds  to  a  layer  of  interconnect  on  a  chip.  In  two  other 
applications,  each  page  corresponds  to  an  electrical  last-ln-f Irst-out  stack 
of  transistors  on  a  chip. 

ENGINEERING  OF  INTEGRATED  SYSTEMS 


The  SCHEME-81  chip  has  been  tested  and  found  to  be  functional. 

The  first  chips  were  received  from  M27PA1  almost  9  months  ago.  Jon  Taft 
made  a  beautiful  Interface  for  using  a  SCHEME  chip  as  a  computer.  Interfacing 
the  scheme  chip  with  a  sophisticated  clock  generator,  single  stepping 
hardware,  history  capturing  hardware,  and  main  memory  cards  from  the  lab's 
production  run  of  Lisp  Machine  memories.  The  first  chips  were  tested  with 
very  discouraging  results.  There  seemed  to  be  little  life  In  them  and  it  was 
hard  to  tell  what  had  happened.  The  voltages  coming  out  of  the  chip  were  very 
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low.  The  highest  outputs  were  about  2.3  volts  —  barely  triggering  the  TTL 
circuitry.  This  made  the  testing  very  difficult.  After  much  deduction  and 
testing,  it  was  hypothesized  that  the  problem  was  that  the  high  diffusion 
resistance  (45  ohms  per  square  in  M27PA1)  was  the  source  of  the  problem.  The 
high  diffusion  resistance  problem  was  accentuated  by  the  very  good  K'  of  the 
transistors  in  the  three  micron  process.  This  led  to  high  currents  in  the 
Inverters,  and  hence  more  voltage  drop  in  the  ratloed  logic  paths. 

The  pads,  between  the  pad  ring  transistors  and  the  pad  logic,  were 
stretched  to  accomodate  the  huge  ground  bus  of  the  chip,  but  the  pad 
electronics  was  powered  by  diffusions  which  cross  the  ground  bus.  Since  two 
inverters  occupy  the  pad  logic,  one  or  the  other  was  always  on.  Simple 
calculations  of  voltage  drop  across  the  minimum  width  diffusion  showed  that 
this  was  the  source  of  the  low  output  voltage  for  the  pads  in  the  "high”  logic 
state.  The  same  source  of  voltage  drop  also  affected  the  internal  logic  high 
levels  on  some  critical  tri-state  pads  outputs.  For  a  while,  it  was  thought 
that  these  bad  internal  levels  were  the  reason  the  parts  were  nonfunctional. 

The  observed  failures  could  not  be  correlated  with  this  theory,  however, 
and  eventually  the  real  problem  was  discovered.  The  diffusion  resistance  also 
affected  the  zero  level  on  superbuffer  drivers  for  the  main  microcode  PLA 
OR-plane.  The  voltage  drop  through  the  "ground*'  diffusion  was  calculated  at 
about  .9  volts  under  normal  conditions  —  far  above  what  is  necessary  to 
destroy  the  logic  zero  level  needed  in  the  FLA.  The  machine  could  be 
sequenced  and  would  do  simple  things  if  Vbb  was  carefully  adjusted  to  raise 
the  threshold  of  PLA  transistors,  compensating  for  the  bad  zero  levels,  but 
this  only  worked  marginally,  since  the  one  levels  were  also  bad.  (The  same 
cells  would  have  worked  fine  on  the  SCHEME-79  process  where  the  diffusion 
resistance  was  only  10  Ohms  per  square  and  the  transistors  had  a  lower  beta) . 
The  message  here  for  other  designers  in  the  3  micron  process  should  be  clear: 
Don't  ignore  the  parasitic  diffusion  resistances  in  power,  and  especially 
ground  lines.  Even  small  numbers  of  squares  of  diffusion  can  be  enough  to 
destroy  the  logic  zero  levels  in  the  gates. 

After  this  problem  was  discovered,  the  cells  were  redesigned  to  improve 
the  performance  on  high  diffusion-resistance  processes.  This  is  now  complete, 
and  simulation  is  in  progress,  in  preparation  for  submission  of  the  revised 
design. 

This  is  where  things  stood  until  a  few  months  ago  when  11  bonded  chips  of 
the  original  design  processed  on  M33MBA1  were  received.  The  diffusion 
resistance  on  that  process  is  about  half  of  the  resistance  of  M27PA1.  These 
chips  were  tested  and  they  appear  to  show  vastly  improved  behavior, 
vindicating  the  hypothesis.  The  new  chips  have  reasonable  high  levels  (4 
volts)  and  have  reliable  behavior.  One  of  them  almost  works  completely!  It 
can  sequence  reliably,  conditionally  branch,  dispatch  on  data  stored  in  the 
registers,  and  make  memory  references.  It  falls  in  some  states,  though  this 
is  probably  a  random  processing  failure.  The  initial  yield  was  expected  to  be 
low,  so  it  is  not  surprising  that  no  chip  works  perfectly,  but  in  a  large 
collection  of  such  chips  a  few  good  parts  would  be  found.  The  other  good  news 
is  that  the  chips  appeared  to  be  fast,  with  a  reliable  clock  time  of  1 
microsecond.  This  meets  the  original  design  spec  of  the  chip  and  would  admit 
a  very  impressive  perfomance  in  running  scheme  programs. 


In  using  RNL  on  the  redesigned  layout,  the  parasitic  resistances  of  the 
clock  distribution  network  became  apparent  as  a  major  performance  limiting 
factor.  Others  should  be  concerned  about  the  parasitic  clock  distribution 
resistance  in  large  designs  if  they  are  after  performance. 

A  trick  used  in  most  industrial  design,  which  has  not  migrated  into  the 
university  community  is  to  use  the  burled  contact  layer  as  a  lower  resistance 
interconnect  in  processes  with  shallow  high  resistance  diffusion  layerrs.  The 
burled  contact,  since  it  is  doped  with  phosphorous  rather  than  arsenic,  tends 
to  diffuse  much  more  deeply,  and  hence  be  much  lower  in  diffusion  resistance 
that  the  normal  shallow  arsenic  diffusions.  For  power  distribution,  and  for 
clock  crossunders  this  can  be  a  valuable  technique  in  reducing  Che  parasitic 
resistance.  Expect  7-10  ohms  per  square  for  burled  contact  material,  rather 
than  the  40-60  ohms  per  square  for  shallow  dlf . 

Dr  r.lng  the  past  six  months  work  has  accelerated  on  the  Schema  system. 

The  overall  specification  has  been  completed  and  work  has  turned  to 
implementing  Che  system.  The  system  is  divided  into  three  layers.  The  first 
layer  consists  of  the  databases  and  cools  for  interacting  with  the  databases. 
The  second  layer  contains  analysis  routines,  while  the  third  layer  contains 
the  synthesis  software. 

The  first  layer  of  software  is  essentially  in  place  and  usable.  Besides 
the  databases  for  dealing  with  topological  specifications,  there  is  a 
schematic  capture  system  that  allows  the  designer  Co  graphically  specify  Che 
topology.  Appropriate  mechanisms  exist  for  representing  waveforms,  device 
models  (including  process  corners)  and  interacting  with  different  simulators. 
Work  is  proceeding  on  completing  and  polishing  the  databases  and  graphics 
system  and  enhancing  it  in  several  ways.  Much  of  the  polishing  is  being  done 
by  Jeff  Eisen.  Kent  Pitman  has  begun  studying  the  problem  of  dealing  with  a 
distributed  design  database.  Doug  Alan  has  been  developing  a  procedural 
representation  for  simulation  specification. 

The  design  of  the  analysis  layer  software  has  begun.  This  layer  is  split 
into  several  components.  The  component  currently  being  worked  on  is 
responsible  for  DC  analysis.  The  purpose  of  DC  analysis  is  to  allow  the 
designer  to  specify  DC  electrical  parameters  (voltages,  currents,  power,  noise 
margins)  which  the  system  will  use  to  deduce  initial  values  for  device  sizes. 
In  addition  it  will  derive  these  DC  parameters  from  a  completely  sized 
schematic  (with  the  help  of  SPICE).  Naturally,  as  the  design  is  polished  This 
has  two  advantages.  First,  topological  specifications  are  insulated  from 
process  variations.  And  second,  although  the  initial  device  sizes  may  be 
modified  as  the  design  is  polished  the  Schema  has  available  detailed 
electrical  information  about  the  intent  of  the  designer  which  can  be  used  to 
validate  the  final  design. 

The  key  problem  in  doing  constraint  based  circuit  analysis,  as  described 
by  Stallman  and  Sussman,  is  solving  the  system  of  non-linear  equations  that 
results.  Unlike  bipolar  devices,  the  first  order  device  equations  of  MOS 
transistors  are  polynomials.  This  allows  use  of  the  Grobner  basis  algorithm 
for  ideals  to  manipulate  the  constraint  equations.  This  is  a  far  more 
effective  technique  than  the  resultant  techniques  used  by  Sussman.  Hand 
simulations  indicate  that  these  techniques  should  allow  dealing  effectively 
with  circuits  of  a  dozen  or  so  transistors  easily. 
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Completion  and  integration  of  DC  analysis  system  will  proceed  this  fall, 
and  attention  will  then  be  turned  towards  AC  analysis  during  the  Spring  as  the 
simulation  subsystem  is  completed. 
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siooxnb: 

A  HXTBODOtOOT  POX  DXSXOmNO 
PAtJX.T-TOtXBANT  VZ.SX  PBO0XB8OB  AXBATX 


HJL  Cftwig*  f.  Thomaen  Ulghtoif  trnold  L  Itomatarg^ 

■•B  UbMtortM  MT  Dylui  UitMMlty 


ABSTKACT.  In  [IB],  B— nbTg  latroduMd  by  naoipia 
Um  OieiraM  •ppmeb  to  tho  dooigo  of  foutt*totoroat 
VLSI  proeouor  orroyo.  lo  this  popor,  wo  uoeovor  ttao' 
pnnetploo  aadorlyiag  tbo  opprooeb.  ond  wo  dortoo  (ram 
tbom  m  ttraUgy  (or  produeiag  Diagonos  dooignt  (or  orbi* ) 
trory  iBtoraoaaoeUoa  aotworlBL  Wo  ooo  tho  otroUgy  to ' 
dortoo  opUmol  Dioganot  doaigno  of  trooo.  gfido.  X-troaa. 
and  Booloon  a*eubai.  a*  wott  oo  tuipriaiagly  olBeioai 
dooigno  at  Baaoa  pMnutoUon  aatworka. 


1.  miBumictwii 

Wo  ttwdy  bora  ono  (oeot  at  tbo  praMom  of  dootgnb^ 
fawIt'talaraBt  mierooL  uilry,  ia  oa  ooairootmnt  tailorad 
to  a  popolar  V15I  orabitootura;  array*  of  idontieal  pro* 
eoaoiag  ahnnoat*  (FX«).  Our  (poeUIe  prablom  U  tbo  (ol> 
lowlag. 

nto  P(A9fa)  Aobtowv  W«  waat  to  oeaatruet  an  B*noda ' 
’  array  A  at  idoaUcal  PS*.  By.  usiag  coaaoiwaUra  doaiga 
rutea,  wo  amy  aaoumo  that  wo  eaa  fabrieato  wire*  a^ 
•wltabo*  porfootly.  But  w*  wtab  to  doaiga  PS*  aggroo* 
■ira^,  to  Biaainnw  daaaity  and  apood.  A*  a  raaulW  tbo 
PE*  osparioaeo  dabiUtatiag  fault*  ladopoadaatly,  with 
probability  p.  W*  want  to  doaiga  a  (ault-toloraDt  array 
orPSa.tbat 

•caa  bo  aoaOgured  to  aiaaiiato  tbo  array  A: 
a  DtiUat*  at  laaat  tbo  fraetloo  a  of  tbo  (ault'froo 
PS*  (oo  w*  hbrioato  a/((l-p)u)  PXo  to  got  tbo 
doairod  a-PE  array): 

*  adoita  aa  oSeiant  layout;  and 

•  utiUsaa  a  owitebiog  moebaaiam  that  la  aia^lo  (lb 
otroetura  and  la  aaao  of  ooaBguratioo). 

Our  apoeiSo  obJacUr*  la  to  atody  and  oataad  tbo 
flbpanaa  [19]  approaeta  to  tbo  P(A:p,u)  prablam  Ibo 
dUallUaa  of  tbo  approaeb  —  aotably.  traaaporaaoy  to  tbo 
PS  dooignor,  ■baplioity  of  eoadguration.  and  bigb  utUi* 
aatloo  of  (autt'lroo  Pb  —  auggaat  tbo  daalrabUlty  of 
aludyiBg  tbo  approaeb  with  oa  aye  to  applying  It  to  o 
wtdo  rarioty  of  latareaaaaeUaa  ootwerlra.  Ibo  S^ 
Inalta  of  our  itudy  ora  nportod  boro.  Rolatod  (boorwU* 
eat  tooua*  oeeupy  [9]. 

Wo  proeood  oo  (oUowa.  By  oaolyilag  a  oomplo’ 
doaIgB.  wo  uaeorar  tbo  prteeipl*  uadorlyiag  tbo 
Blogoao*  approaeb.  Wo  uo*  that  prtouipl*  to  derHo  a 
'.otratogy  for  praduetag  Biagoao*  doolgn*  of  *«bltrary 
tetareoanoeUen  attworlca.  Wi^l.utrata  tbo  otratogy  by 


dotiviag  eptUaal  Siegoo**  dooiga*  ot  arbitrary  trwo*.  of, 
grid*,  of  X'trao*.  oad  o(  Boeloaa  B-eubo*.  aa  woU  oa  a| 
aurprialrigly  oSleianl  Dlogaboa  daaign  of  B«**  p*rmuta-l 
Uoo  netnork*  [3).  Tbo  papor  eloao*  with  rwsoareb  quo** ' 
Uaaa  awoltlr^  roaohitloia 

Balotad  Bhrb.  Tbo  laebalgBa*  that  bara  boon  pr^l 
peood  la  tbo  litaratura  (or  aetrlng  tbo  P(A:p.u)  proMoml 
naa  oo*  of  too  boaie  atrataglo*.  Tbo  aebotaa*  lo 
[2.9.19,13.19.92]  loeetporata  Into  oaeb  n  a  awltebiag 
olonant  that  eaa  eonooet  that  PS  to  ooma  find  ropor- 
ieira  of  potontlal  naigbbor*.  Appropriate  ooiteb  aattmg 
in  tbo  (aott'fra*  PEa  tatorcoonoeta  aoos  (roetioa  of  tbo 
good  PEa  to  roalin  tbo  Idoal  array.  Ibo  aebonan  la 
[4.11.19.19.19,91]  poalt  a  awltebiag  aotoorfc  diaioiat  from 
tbo  PSa.  PE*  or*  eoaatrnetod  *a  V  far  tbo  Idea)  array, 
but  are  latereaaaaetad  tbraugb  Use  awitcbtaig  aot’rark. 
ratbor  than  dIraoUy.  Tbo  achonir  la  [10]  ompley*  a| 
hybrid  otratogy. 

Tboro  bar*  boon  a  (*w  papora  that  aaalyB*  rather 
than  proaaat  doaiga  malboda.  [17]  orabsato*  tour 
approaeb**  tar  daaigniag  (oiilutaianat  liaoar  arraya. 
Kia  BMia  eoaeluaioB  la  that  aoeb  oralaationa  eaaoet  bo 
abaototo;  on*  matbod  nay  be  proforrod  wbaa  dadgning 
amoU  array*  of  largo  PEa  oberoo*  oaetbor  ia  ouponor 
*  for  largo  array*  of  amaU  PE*.  [14]  darira*  a  modal  (or 
aaooaamg  the  ooat  of  a  gtroa  doaign  otratogy.  [90] 
praaonta  rridaao*  that  tbo  intarnal-owlteb  otratogy  pro*' 
doea*  doaign*  that  eonaonio  too  nweb  layout  oraa  to  bo 
eooaidarod  (or  any  but  tb*  amalloat  arraya. 

A  IBBSRXXnSSBIGIIAPPBOACB  I 

S4.  AbMimllbawfptbin 

Tbo  major  alma  of  tb*  Dlogaao*  Atdgn  oppraaeh: 

ora; 

*  to  rondor  lb*  doaign  of  tb*  fanlt-tolarant  network, 
tranaparoot  to  tbo  dooignor  of  tbo  PEa; 

*  to  onootruet  a  eonfiguration  naobaniom  that  ia 
roeeadgnrable  end  a*  liapla  aa  poadbio  la  “program*  to* 
Iba  doairod  atrootisra; 

■  lo  anbanee  toatobillty  at  a  ayatam  tarol  by  building 
hate  orary  array  o  aaaB*ln/*eao*oat  meebaniam  (or  lo^ . 
latlng  and  aeeoadng  oaeb  PX 

*  la  utilU*  (to  Ibo  ostont  oUewad  by  array  atrueturo)  *11. 
fault'fra*  PEa 

1b*  approaeb  oan  bo  aunanarlaad  aa  toUewo. 
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Mgun  i.  On*  mU  of  Iht  DtotasM  layout  of  tJbo  doftb-SLeomploto  btauy 
trow  (o)  UBcoaBturad:  (b)  eonflcund  tor  a  tadtty  FX:  (o)  eooflcurad  for 
a  food  loof'PEi  (d)  ooBlIturod  tor  a  good  oaolsor-n. 
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Obo  oiabot  io  aal«o  tbo  P(Av.a)  problom  for  aoao 
gtvou  array  K  Oao  bogino  by  labrioatt^  U^/((t-p)u)  PEo 
la  a  (logieal.  If  aot  pbyiieal)  Uao.  otth  aoao  awiabor  of 
“laiadioa*  of  oim  ruoalag  aboot  tho  liao  of  Plo.  teo 
thoa  oeaas  aloag  ttao  Uao  of  PXa  to  datoradao  vbieb  aro 
faulty  aad  wbieh  ara  faalt>fraa.  At  aaeb  good  FS  la 
oaooaatarad.  li  la  bookod  iato  tha  bnodlaa  of  olroa 
through  a  aatoork  of  aoitebaa.  thoroby  eooaaetlag  that 
FK  to  tho  fanlt'froo  FSa  that  haaa  alraady  baoa  fauad 
oad  praparlng  It  lor  aaaotual  eonnaetiao  to  thoM  that 
oU  ha  fouad  Oat  atopa  lookiag  for  good  FCa  onea  M 
haaa  baaa  fouad.  (Altaroativaly,  ooa  eouM  look  tor  all 
tha  goad  FCa.  and  buUd  tha  largaat  array  of  the  glean 
alnictura  that  one  caa.) 


For  iUaatraUoo,  eonafdar  aa  osaapla  horn  [!•].  a 
T>aedo  eomplata  binary  troa.  Tbia  ahapta  ootwork 
atrueUaro  aaada  only  ooa  bundle  of  wtrea,  aad  that  boa- 
dlo  aaoda  eoataia  oaiy  throe  otma 

/Me.  To  abapUfy  eapeattiea.  aa  depict  arraya  attb 
unit-baadaidth  eonuaunicatioa  baka.  Ramoaiag 
tbia  roatrietioa  la  juat  a  elarical  mattar. 

Oao  ooU  of  a  Dtogeaaa  layout  of  the  troa  appoaro  la  Dg. 
1(a). 

Mbta.  Tha  Uaaa  aboao  tha  FX  ora  tha  aiagla  buadto 
aaadad  for  the  layout.  Tha  aaitehaa.  repraaaatad 
tor  itavlietty  by  paaa  tnaaiatam,  ara  aot  by  too 
eeatrel  Unaa,  OOOS  that  la  high  vbaa  the  FX  la 
fauMpfrae  aad  lov  ahaa  It  la  faul^,  and  UXAF  which 
la  high  whoa  tha  n  la  to  act  aa  a  leaf  of  tho  treo 
aad  low  otbaraiaa.  Flga.  l(b)*td)  Indinato  how  Iho 
awttobaa  aro  aot, 

Tha  layout  of  Fig.  1(a).  daaertbad  la  lama  of  a  dapth>d 
troa.  waa  darlaod  aa  foUowa.  Va  atari  out  with  the  la 

a  line.  Va  eoaatruct  a  aiagla  bundle  with  wiraa  aB» 
bared  lA-.d.  Wo  taat  Iho  PXa  aa  that  wo  kaow  wbieh 
aro  good  aad  which  ara  faulty.  Nest,  cm  prooaod  dowa^ 
tho  baa  of  PXa  fruoi  right  to  lufh  do  ww  oneouatar  a' 
good  FX  that  ia  to  bo  a  1^  of  tho  troo  (a  akaple  aoaiari*' 
eal  femola  toUa  oa  wbicb  aheald  bo  loaeoa),  wa  haea  It 
aooaeot  op  to  liao  1  ia  tho  bundle  (tberaby  propariag  It 
to  ooaaaet  to  Ita  lather  In  tha  troo).  almultanoowaly  ba^ 
lag  liaoa  1  through  d-1  *ahlft  up*,  to  "baeoaaa*  linoa  t 
through  d.  raapoetlealy;  awltehca  diacooooet  the  loft 
porta  of  tha  liaoa  from  the  right  porta  ao  that  aoda-to* 
node  eoaaoetieity  reaialna  eorrocL  The  boadlo  haa  thua 
babaewd  like  a  ataek  being  FUSHad;  aeo  tha  left  dda  of 
Flgt,  l(e.d).  When  wa  aneeunter  a  good  PE  that  ia  to  be 
a  Bonlaaf  of  the  tree,  wo  connect  It  to  the  alaek/buodte 
la  two  etagei.  nrit.  we  have  the  PE  connect  up  to  Unea 


Afiva  M.  (•)  Ik*  d*pth-3  eonipteu  binary  lr«*.  (b)  1b*  «idtb*S 
Olagaa**  (pr*^*r)  liaaaiiuUan  sf  Ui*  tr**. 


1  and  t  af  ib*  bundl*  (tbaraby  eonnaating  tb*  nod*  to 
Ua  aana  in  tb*  traa),  abnulunaoualy  baalag  linaa  S 
tbraugh  d  “ahlft  dovn'  ta  'baeana'*  linaa  1  ttaraugh  d*Z. 
raapaetnaljn  again  aarilabaa  aoaara  that  prapar  nada- 
tMod*  aannaetMtp  la  naibitaiaad.  Th*  bundl*  ha* 
bar*  hahaaad  lika  a  ataek  baing  POPpad:  an*  Ib*  nght) 
Ma  ar  fig.  t(d).  Saaand,  v*  h**a  tb*  PE  PUSi  a  aaa>| 
naatiaa  aata  tba  aiaak,  ta  prapar*  far  aaantual  aaaaaa>‘ 
Uan  ta  ita  fathar  in  th*  Ira*.  Tba  praaaaa  w*  hava 
dUMftbad  bar*  layi  tba  traa  ant  In  frwardtr  (ef.  Tig.  2). 
Hanaa.  a  d-wtr*  itaek/bundl*  auflleaa  ta  lay  aul  a 
daptb^  aanplat*  binary  Ira*. 

iM*.  Our  daalga  alratagy  alll  raquir*  highlighting 
aartain  adg>a  a(  Ib*  aataark  batng  laid  auL  at  aatt, 
aa  adding  u0w  adga*  ta  lU  Higbiigblad  adgai  ib! 
Tig*.  2*4  ar*  raprataatad  by  bald  linaa;  addadi 
adga*  art  raprataatad  by  dattad  linaa  Ibaj 
aignlSoane*  af  bath  kinda  of  apaeiat  adgaa  aill  ba' 
axpUinad  in  Saottoa  4. 

Ib*  praaading  aaaaapla  tbauld  autBea  ta  Intraduaa 
tb*  Oiagana*  appraaeb.  Tba  datignt  in  [10]  ataapUly  tba’ 
prablam  a(  ean^uring  tba  natwork  by  arganlilng  tbair' 
air*  buadlat  aa  aitbar  atackt  (aa  aur  atanipl*)  ar 
quaua*.  Far  asanpl*.  twa  bit*  at  inranaatian  (a  eantrat 
bnat)  par  PE  tufilea  ta  eanflgur*  a  Ua*  of  PEt  lata  any, 
daptb  eaoiplaU  binary  traa:  aa*  bit  tall*  ahatbar  nr  ant! 
a  n  la  goad:  tb*  atbar  tail*  ahatbar  or  aot  It'a  a  laaf.  A| 
laat  ttrueUirad  bundl*  (*4..  a  eraatbar)  would  roqidr*  at 
nuinhar  of  bila  par  PS  proportional  la  tb*  daptb  or  tbal 
Iron;  I 

U.  atneb-IndBaad  Uiyanta 

Th*  Diogaaa*  dabga  approach  la  diatiaguiabad  rnot 
atbar  aatamaVawitttb  approacbaa  (*.g..  [4,18.21])  In  It* 
atraotuftag  awUchaa  to  tbat  air*  buaAat  bahavo  a* 
ataoha  ar  quaua*.  R  la  tbit  orgaaiaiag  prtneipl*  tbat  wo: 
aaploit  ta  astand  tba  approach  ta  arbitrary  lot*roBa»' 
oaotioa  aatwarka. 

V*  raalrlet  attaatloa  bora  ta  Ologaa**  daaiga*  tbat 
ofgaala*  mra  bundlo*  aa  ataaka.  Staak'buadlaa  aro  (aa 
polatad  out  la  [19])  aaaiar  to  Imploaant  Ibaa  quouo* 
bnadlaa.  tboroby  rataloreiag  aur  quaat  tor  aariiy  appS- 
aabl*  raaulta.  lioraotor.  It  la  aur  aoporlaae*  tbat  laam* 
lag  ta  rodton  about  ataaka  balpa  on*  to  raaaao  about 
quauaa. 

FlaaUy,  an  avwau*  for  ainvUOcation:  Tb*  Ologaao* 
*r*eip**  baa  two  parU:  a  faulty  PE  it  patted  by  without 
bookiag  it  into  any  ttack/bundla;  a  fault-fra*  PE  la 
boohad  into  the  bundlat  in  tome  ralatiroly  eompUeatad 
way.  Ib*  formar  praieripUon  it  not  laiaratting:  on* 


igaorwa  bad  PEi  by  ttraigbUorward  uaa  of  tba  BOOD  con¬ 
trol  lino*  tbat  appear  in  trary  Dioganat  daaign  (ef. 

1).  Tba  iatareatiag  aipaet  of  Biogeno*  datignt  it  hew 
tbay  ua*  atackt  ta  raaliia  IntarconnaeUoni  among  tba 
goad  PEt.  Eaeognlaing  tbit,  w*  tiinplify  our  atudy  by 
Igaofiag  tba  GOOD  bnat  and  tbair  rola  In  natwork 
eoaaguratieo.  Thia  roiagataa  to  tb*  baokgrouad  the 
fault-talaratiag  aipaet  af  tba  motivating  prablam  and 
oonaantratat  talaly  on  tb*  prablam  of  how  to  ut*  ttaekt 
ta  omiflguro  a  lino  of  (laull-trca)  Pb  Into  any  daiirad 
array  itrueturo. 

Tb*  aatane*  of  bavlag  a  wirw-bundl*  act  aa  a  ttaek 
Is  tbat  int*r-PE  eoanaeUoot  mad*  uting  tbat  bundl* 
Bavor  eTot*.*(1bi*  la  both  aaeattary  and  tulBeianL)  Our 
topic  at  Rudy  Ibut  roduoo*  to  tb*  taUowmg.  As  it  eut- 
toamry.  w*  viaw  arrays  as  undlrwetad  graphs  (el.  [12]}. 

Tbs  nrmmi  Xnyout  fboblam.  Ta  partllioa  Ib*  adga*  at 
Ib*  graph  C  and  ta  lay  C  out  in  tb*  plana  la  such  a  way 
that: 

■  tba  vortloa*  af  C  bo  on  a  Uaa; 

*  all  sdgas  at  C  Us  above  tb*  Um: 

•  ao  two  sdgot  la  th*  tarns  bleak  of  tba  partiUoa  eroas. 

la  viaw  of  aur  sarUcr  remark,  it  is  elaar  that  our 
problem  af  rsalisiag  arrays  using  ttaekt  is  aquivaloBt  ta 
tbs  tonaal  piablam  just  stated.  A  third  latmulatioc  olU 
ba  BsstuI  lor  inaigbt. 

Ib*  graph  C  is  milrrplawar  If  Ita  vorliess  can  be 
plsaad  on  a  eirois  la  lueb  a  way  tbat  tba  edges  of  C  era 
nooerottiag  chords  af  tba  eirels. 

/bwptaaiaw  f.  [8]  A  graph  it  raalisable  with  on*  ataek  If. 
and  only  If.  It  is  autarplanar. 

Vs  am.  thus,  studying  muiM-aularptanargraphs: 

Tb*  graph  C  is  A  aufirpiminr  If  tt  is  tb*  unioB  of  k 
autarplanar  graphs  wboas  autarplanartty  it  wit- 
otttad  by  tbs  same  layout  of  V*rUBaa(C)  an  a  eir- 
al*. 

^tpmMItit  g.  [a]  A  graph  is  roalisabis  with  k  stacks  If. 
and  only  V.  It  it  k-outsrplaaar. 

1.2.  TbmBMbtyafaPttgmissinyont 

Tbio*  parumatar*  mnaaurs  tb*  quaUty  of  a 
Dioganat  layout  of  a  graph  C: 

1.  tb*  number  of  ttaekt  tmploytd  in  Ibo  layout: 

2.  tba  (a)  mdtutduat  and  (b)  aumulaMw*  Cttack>vidtAs 
(■  nambar  of  luist)  ef  tb*  ttaekt  utod  la  tba  layout; 

8.  tb*  number  of  eentrol  bita  naodad  to  eenUgar*  tb*. 
layout'  givon  tb*  layout,  aaeb  vertts  v  of  C  bat  an  aate- 
ciatad  voator  of  pairs  of  nannagativ*  InUgcrt.  eallad  lU 
»»P«. 
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TtvJ  *  «L,.K,>.  ...>. 

•■eb  U  (rtip..  R|)  ia  tha  number  at  adgaa  iaeldant  ts  * 
tbal  eonnaet  via  Slack  i  to  aartieea  lying  to  Ibe  left 
(reap.,  right)  of  v.  Tbia  macaure  la  the  baaa*2  logarithm 
of  the  number  et  daatinet  eerlaflypet  in  the  layout  ot  C. 
LO,  the  number  of  “eoatror  bita  needed  to  eondgare 
tba  layout  in  the  proaenee  of  faulto. 

1o  weight  tboae  meaeureo  in  deaoaaing  order  at 
iapartanec  when  "optuBiaing"  layouta.  in  {Sj  we  ctudy 
trodeoSa  among  tbonL 

3.  mocBraB  uToms  or  TnaES 

Our  layout  at  the  n-node  eomplete  Unary  troe  la 
optimal  witb  raapoet  to  oil  three  quality  maeauroo:  Ita 
one  atack  reapeeti  the  outcrplanarlty  at  the  troe  (Prop. 
1);  ita  log  o  width  roapeeta  the  lower  bound  of  [Sj;  and 
ita  two  control  bita  per  PE  roapeeta  our  inaiataneo  on 
laull  tolaronee.  We  eon  do  olmeet  oa  well  witb  arbitrary 
treeo. 

Pbel  ].  (a)  Any  n-node  k-ary  tree  adndta  a  Otogenaa  lay¬ 
out  with  one  atack  of  width  *  W(n)  k/2  lag  n. 

(b)  Ihoro  ia  a  And  layout  uaing  a  aiaglo  width-W(a) 
atack  and  uaing  l>2tag((k-»l)  control  bita  per  PE,  that 
can  bo  eondgured  to  any  k-ory  tree  booing  n  or  tower 
nodea. 

Ado/  AetcA.  Let  C  be  a  graph.  One  adda  o  frtata  to  a 
aartaz  a  of  C  by  appending  to  a  o  line  of  (poaalbly  0)  acr- 
tieea: 

a-w,-a|-_.-a,  neO. 

A  fringing  ot  C  ia  a  graph  obtauied  by  adding  a  fringe  to 
each  aertea  of  C. 

Coocentrata  on  ana  aortaa  a  of  C.  Soy  that  when  C 
la  laid  out.  a  ia  Oanked  by  aertioca  u  and  w.  Lot  a  baao 

two  fringoa.  a, . a,  and  a*, . a*,  (one  or  both  con  ba 

empty),  loy  the  tringee  out  in  the  indicated  order, 
betwoon  either  u  and  a  or  a  and  w.  To  ehoaae  the  aideo 
and  otoeka.  look  at  a^t  typo.  Put  the  Oral  fringe  on  tho 
aide  and  the  atack  baaing  the  amalleat  integer  entry  io 
a’a  type:  place  tbe  aceond  fringe  uaing  the  amalleat 
entry  in  a*a  (now  altered)  type.  Tbia  inereaoaa  the 
eunoilatiao  ataekwidtli  by  at  moat  1.  while  leaaing  the 
atacknumber  uaebanged. 

Fact  1  new  foUowi  by  aarlfybig  that  any  k-ary  troe  T 
eon  be  *built~  by  leaela,  by  otorting  witb  a  olngle  aortas 
and  "double'-fringint  the  graph  d  k/2  log  rt)  Umea. 
The  number  of  control  bita  foUoeo  from  eountirqi  the 
nomher  of  diatincl  aortea-lypea  when  all  aorticoa  baao' 

dagroe  at  moat  ka  1.  [] 

4.  ACBoauLuamjrBBURBnc 

Tho  layout  teebnique  of  Fact  1  buUda  ospilcltly  oa 
the  atructure  of  the  gropha  being  laid  out  It  would  bo 
noetul  to  know  what  to  look  for  in  on  intareoanaetioa 
netnorlc'a  atrueturo  to  help  one  Snd  elBeienl  layouta  of 
arraya  of  that  atrueturo.  Ezporience  from  numareua 
Oiegonea  layouta  baa  led  ua  to  the  folleanng  beuriatie. 

Tha  graph  C  la  Aomflfeidan  If  there  ia  e  cycle  In  tba 
graph  that  nieala  each  aertaa  Juat  once.  The  graph  C  ia 
an  augnienfaNen  at  the  graph  C  If  C*  is  oblaiaad  by 
adding  k  h  0  edges  to  C. 

4.1.  A  Banrtatle  layout  Procadors. 

To  find  a  Diogenes  layout  for  C; 

1.  Augment  C  (if  neceaaery)  so  tbst  it  baa  a  bomil- 
tonian  cycle. 

2.  Cut  tbe  cycle  to  obtain  a  layout  of  C  in  a  line. 

3.  Assign  edges  to  stacks  using  edge  eeloruig  as  in 
tP' 


As  one  asampic  of  the  heunalic.  our  layout  of  com¬ 
plete  binary  trees  results  from  applying  tbe  procedure 
to  "preorder"  augmentations  of  tbs  trees.  (Sec  Fig.  2(s): 
the  ebosen  cycle  consists  of  bold  and  dotted  lines.) 
Other  one-stack  layouts  of  trees  eziat  (cf.  [7]),  but  none 
has  smaller  width  or  number  of  control  bits. 

4A  Ortgtoof  thoHaortaUa 

The  hauriatie  bad  two  origins.  Tint,  tha  bauristie 
ombodiaa  the  proof  of  Proposition  1.  Second,  it  embo- 
dies  tha  proof  at  tba  following  ganaralization  oif  Propoai- 
Uon  I  to  a  wide  eloss  of  planar  graphs. 

A  graph  la  asitAosisiffewSon  If  it  baa  a  planar  baoiil- 
Ionian  augmentatiOD. 

Aspoaitsan  3.  [6]  Tba  graph  C  ia  tsm-atoek  realisable  (w 
2-outarplaoar)  if,  and  only  If.  it  ia  aubhamiltonian. 

4:3.  AppHaattohs  at  the  Heurtatie. 

SqnaiwfMda 

The  augmonted  cycle  formed  by  row- by  row  a  weeps 
In  a  square  grid,  aa  Indiaatad  in  Fig.  3(a).  loads  to  the 
layout  ot  Fig.  3(b).  wbieb  ia  optimal  in  number  ot  stacks 
(tha  grid  ia  planar  but  net  outetplanar).  ataeksrtdtb  (sea. 
04..  (iBl).  and  number  ot  node  types  (tba  layout  distin* 
guiabea  only  between  sast-to-srasl  and  west-to-aast  rows 
of  the  grid). 

/bet  2.  Tbe  nzn  square  grid  admits  a  tsm-atsek  Diogenes 
layout  with  staekwidtb  n  and  with  two  node  typos.  This 
roalualion  is  optimal  io  all  three  parametcra. 

X-Tiooo 

Tba  daplA-d  X-trae  X(d)  is  tbe  augmentation  of  tha 
deptb-d  coiaplsts  binary  troe  that  adds  edges  going 
across  eacb  lessl  ot  tbe  tree;  see  Tig.  4(a). 

X(d)  baa  eutwidth  d  and  is  subbamiKonioa.  but  not 
euterpleoar.  ftius  tbs  best  possible  Dioganaa  layout 
would  use  two  stacks  et  width  d.  It  is  eery  bard  to  find  a 
two-slack  layout  of  staekwidtb  smaller  than  roughly  g*. 
(All  obsious  bamiltooion  cycles  lead  to  tbia  enormous 
width.)  Tbe  bandltonian  sugmantaUen  of  X(d)  of  Fig.' 
4(a)  loads  to  the  staelmdth-3d  two-stock  layout  et  Fig. 
4(b). 

/bet  9.  X(d)  admits  a  Dioganaa  layout  witb  two  stacks, 
ena  of  width  2d  and  eoa  of  width  3d.  This  rcab**iien  ia 
optimal  io  atacknumber  and  within  a  factor  ot  &  ot . 
optimal  in  ataokwidth. 

Tba  only  subtlety  barn  ia  to  warify  the  claimad 
stackwidtha.  Aa  port  of  our  skstchad  vertAeation.  wa 
doacriba  tba  layout  more  formally.  We  procoed  by 
iaduetioo.  Say  tbst  wa  bavo  a  layout  of  X(d-l)  witb  the 
claimad  poronistaTs  and  the  following  form.  We  depict 
the  layout  acbematically  by  its  linearization  of  tbe  ver- 
tleea.  togetbar  witb  a  lew  rslrranl  edges  For  nmpUeity. 
wo  draw  edges  in  stack  1  sboso  tba  Una.  tboaa  in  stack  2 
below  tbe  lias 


bore  r.s.t  ore.  rospccUecly.  the  root  of  X(d-l)  and  ita  left 
and  right  sons:  a,/  are  tba  atrb.gs  comprising  tbe  rest  of 
the  tress*  sertiees  Assume  for  induction  tbst  in  Layout 
1:  (I)  tho  lafl  spine  nodes  (o  leftmost  nodes  at  each 
level)  of  X(d-l)  appear  In  Icaf-to-reol  order  in  s  tbe 
right  spine  nodes  [the  rightmost  noder  at  each  level] 
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^  ^wr«  3.  Tha  aquara  frU  and  lU  Otonanaa  UnaariuMoo. 

_  AfMra  4.  (a)  Tha  daptb-4  X'traa  X(4).  (b)  A  twvilaak.  «Mtb>4  Diofanat 
▼  llaaariaatioa  otX(4}. 


appaar  (anneanaaeuthmljr)  lo  root'to'taar  order  la  f:  {Z) 
tba  aodas  r.a.1.  and  all  of  the  loft  and  rlfbl  ipiaa  nodea 
an  aapaead  from  tba  battoia  in  tba  aanie  that  no  odsaa 
paaa  totally  under  them:  (3)  the  width  of  itaeh  1  ii  d 
ad-2t  (4)  tba  width  of  itack  Z  la  0  below  the  left  apioo 
aodaa  aad  la  <  3k-3  to  the  ridbt  of  tba  leeal'(d*k-1) 
aptna  aodaa.  Taka  a  aaoond  copy  of  layout  1: 

XavaatJt  ^ 

B'arf^. 

The  layout  of  X(d)  [wboaa  wertaa-aet  la  the  union  of  tha 
wartaaHwta  of  lU  two  doptl><(d*l)  aub-X-tmoo,  plua  a 
root  node  r*]  la  obtaiaed  from  the  iadioatad  layouta  aa 
fottowK 


A  earatui  aaaiyala  of  the  oompoaita  layout  aataada  tho 
ladaatioo.  Aaalyaia  of  anall  trooa  eotaplataa  the  Indue* 
tioa.  whieb  aatabliabaa  our  elainn. 


Lat  a  be  a  power  of  Z.  The  wtnpul  Bmm  fiafaiark 
Wn)  ia  doBaad  induethtaly  ai  followa:  tea  Fig.  3(a). 

■  B(Z)  ia  the  eompleta  bipartite  graph  Xu  on  two  input 
aorUeoi  i|j,  lu  and  two  output  aarticaa  Oi.i,  0|^ 

*  B(n)  ia  obtainad  from  two  eopiat  of  B(n/Z).  n  new  input 
aortiaaa.  "  "**  output  aartieoa, 

aa.i-oa>"  'Owi.  For  each  Icksn,  one  addr  edge*  eraating 
one  eopy  of  Ku  with  "inpuU"  i, ,  and  i, and 


*outputa”  WafeOhAl'i/ak  o"^  oho  eopy  of  Vi  vith 
'lapata'’  0|/u  ohd  eVu  ood  "outputa'  and  oaa^t  - 
(primed  aartieoa  eema  from  tba  aaeond  eopy  of  B(o/Z)). 

Benoa  natwoito  are  nonplaaar.  baaca  require  at 
loaat  tbrwa  ataeka.  Wa  taa«a  not  yot  acbtorad  thia  bound, 
but  wo  baaa  found  a  aia*ataek  maliaation.  by  noana  of 
the  hamUtaaian  eyela  that  altamataa  runo^  up  and 
down  tba  "oolunma'  of  inputs  and  outputs  el  B(n):  aaa 
Fig.  3(s).  tfa  use  throe  staeks  to  eennaet  aaeb  "eoiumn* 
of  aortieos  to  tba  nast;  and  wa  altamate  aats  of  throe 
stocks  as  wo  preeeasd  along  tba  graph,  it  is  surprising 
that  any  family  of  graphs  capable  of  raaliziog  all  parnw 
tatioos  eon  bo  laid  out  with  a  Bxad  onmhar  of  staeka. 

/het  4.  B(o)  admits  a  Sieganas  layout  using  six  stacks, 
aaeh  el  width  a.  This  maliution  Is  within  a  factor  of  Z 
of  optimal  in  staeknumbor  aad  within  a  factor  of  B  in 
aUcfcwidth. 

Tha  saaio  layout  stratagy  yields  layouts  of  eemper- 
oMo  cflicioney  for  structural  ralaliaas  of  B(n).  including 
(log  n)»staga  cyclic  abiftaru. 

The  Boaloah  B-Coba 

Tha  ad  son  ta-cuha  C(n)  has  as  aartieas  the  sot  of 
ail  loogth'n  binary  strings.  nring>aartiees  arc  adjacent 
fust  whan  they  bawa  unit  Hanuning  distance.  Th'.*  C(n) 
has  Z*  aortieos  and  nZ*  edges.  Since  C(n)  is  bard  to 
aiaualisa  for  n>3.  wa  describe  its  affieianl  layout  ia 
tarres  of  strings  ratbar  than  tba  graphical  medium  of 
'bamiltenion  cycles. 

Abel  9  C(n)  is  n*staek  realizable,  with  one  stack  of 
andlb  Z*  for  aaeb  OsiCn.  Tbit  raaliTstion  i*  within  s  (ac¬ 
tor  of  Z  of  optimal  in  both  rtarknumbrr  and  cu*nuln*'Vf. 


22 


s 


hai  Oi4i  <ijii  0.2)  0.4)  (t.t) 
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Afvri  A  (a)  Tba  8>iapul  Sanaa  naUtork.  (b)  A  i 
Aral  Ihfaa  laaals  of  tba  aatwork. 

ttaekiMth. 

Ibo  looar  bound  on  staeknumbar  ia  Immodiala  from 
thraa  faeta:  (a)  StaokaaBibar(C(n))  »  tba  nuntbar  of 
ootarptanar  grapba  into  abieb  C(n)  oan  ba  daeompOMd; 

(b)  an  N-aartax  outarplanar  graph  baa  at  moat  2N  adgaa: 

(e)  C(n)  baa  nS*  a  N  lof.  M  adgaa.  Tba  loaar  bound  on 
ouoailnttTC  aUekaidtb  ia  oaay  to  dartvo.  i 

Tba  uppar  bound  ia  aaao  owat  aaaily  by  doaeiibing 
bidoetiaalr  tba  Unaarixatiou  of  C(n)'a  aartieaa. 

*  C(l)’t  aortieaa  ara  laid  out  aa  talloaa. 

0  1  : 

ao  ana  aldtb'l  ataek  audbaaa. 

*  Aaauma  that  C(o)  la  raalinad  aitb  n  ataeka  of  aidtba  ’ 

LC_?''.aUtboUnoariaaUoa  (lotting  Nag*)  ; 

0,0f  0ii 

aoeb  A  boing  a  diatinet  langtb-n  binary  word.  Tbo 
following  layout  for  CCna-t): 

OftOfc-  OPalAi  ■IfctA, 
uaoa  )ual  ono  aaora  ataok.  of  width  Nag*.  Tbia 
axtanda  tbo  Induetioa. 


:  layout  of  tbo 


1.  la  tboro  a  find  nnmbor  S  aueb  that  all  planar 
grapba  ara  S-ataek  raoUxabloT  i 

t.  la  tbara  a  ttxad  numbar  S  aueb  that  aU  N-aodo' 
outarplanar  grapba  can  bo  roalixod  with  S  ataeka 
of  width  proportional  to  log  NT 

Tba  diplib>n laddar  Un)  ia  aa atf  grid.  (n)l/n)' 
b  oulmrplanar,  boaoo  on^alaek  raalixabla.  (b) 
Any  oao>ataek  raabiatloa  of  L(n)  hw 
ataekwidlh  b  n/Z.  (e)  Tbara  la  a  twwttaek 
ualt'Widtb  rooUxatioa  of  l^a). 

A  la  tbaro  a  ttxad  aanbar  S  aueb  that  all  N-aoda 
planar  aubbamiltaaiaa  grapba  oan  bo  roalixod  with 
S  ataeka  of  width  proportloiial  to  N*^*? 

4.  Can  Baaao  aatworka  ba  roobrad  with  fowor  than  ala 
atackaT 
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Abstract 

This  paper  discusses  the  motivaiioo  and  proposed  fune* 
tionahty  of  an  expert  s>'stem  to  aid  tbe  design  of  X'LSl  sys¬ 
tems.  We  present  some  guiding  principles  for  the  construc¬ 
tion  of  sucb  a  system  and  discuss  tbe  organization  cbosen 
in  our  particular  system,  SCHEMA. 


Introduction 

Much  of  tbe  current  \XSI  rcsearcb  in  universities  con¬ 
centrates  on  those  aspects  of  tbe  design  process  that  enable 
one  to  turn  an  idea  into  a  set  of  masks  as  quickly  u  pos¬ 
sible.  Converting  those  masks  into  fully  functional  parts 
with  adequate  performance  and  aecepuble  yields  it  an  ar¬ 
duous  task  that  often  requires  at  least  as  much,  effort  as 
tbe  initial  design.  This  task  is  often  left  undone,  and  tbe 
consequent  degradation  of  functionality,  performance  and 
yield  IS  chalked  up  as  a  cheap  price  to  pay  for  participating 
in  the  VLSI  revolution.  In  ceruin  situations  these  cosu  are 
small  when  compared  with  size  and  reproducibilit}'  benefits 
over  a  comparable  MSI  implementation,  but  one  cannot 
casuallt'  ignore  a  factor  of  10  unprovement  in  performance. 

It  it  always  easier  to  incorporate  functionality  at  tbe 
beginning  of  a  project  than  splicing  it  in  after  completion. 
This  it  surely  true  of  performance  and  yield  considerations 
also.  It  is  our  belief  that  with  proper  tools,  designers  will 
be  willing  and  able  to  take  these  aspects  of  tbe  design 
into  account  early  in  the  design  process.  At  first  tape-out, 
the  designer  of  an  integrated  circuit  should  not  only  be 
confident  that  tbe  design  has  tbe  desired  functionality,  but 
that  it  also  meets  tbe  performance  goals  that  have  been  set. 

Fbnbermore,-  tbe  designer  should  not  need  to  relax 
tbe  project’s  performance  goals  to  use  tbe  design  system. 
Tbe  design  system  must  not  sacrifice  performance  or  area 
(yield)  significantly.  Otherwise,  tbe  system  would  be  used 
only  for  the  ‘Sinimportant”  projects.  In  our  opinion  this 
design  system  should  initially  act  as  a  designer’s  assistant, 
keeping  track  of  tbe  details  tbe  designer  does  not  care 
about  and  performing  tbe  monotonous,  repetitive  opera¬ 
tions  that  would  be  delegated  to  apprentice  designers.  Ex¬ 
amples  of  monotonous,  repetitive  operations  designers  re¬ 
quest  of  human  apprentices  are;  Does  this  inverter  have  a 
tnp  point  of,  2.3  volts*  Does  tbe  bootstrapped  node  boot? 


Or:  Does  this  adder  operate  properly  after  a  stretched  clock 
cycle?  Using  tools  that  can  answer  these  questions  as  a 
basis,  more  powerful  and  reliable  design  synthesis  tools  can 
be  built. 

Such  a  design  system  needs  a  more  sophisticated  model 
of  the  circuits  and  of  tbe  design  than  is  incorporated  in 
most  X'LSl  design  systems.  It  would  embody  a  fair  amount 
of  “expertise”  in  circuit  and  st'stem  design.  By  virtue  of 
“knowledge’'  contained  in  tbe  system  and  tbe  variety  of 
ways  that  information  can  be  used,  such  a  system  would  be 
considered  an  expert  tystem  in  VLSI  (circuit)  design. 

CiittiiM  for  Expert  Systems 

We  are  building  a  sysum  called  SCHEMA  that  at¬ 
tempts  to  deal  with  these  issues  and  provide  tbe  sort 
of  environment  just  described.  The  following  paragraphs 
describe  some  of  tbe  design  criteria  we  are  using  to  organize 
SCHEMA  and  tome  of  its  proposed  functionality. 

Much  of.  SCHEMA’S  organization  is  based  on  our 
experiences  with  MACSYMA  (a  very  large  tystem  for 
performing  symbolic  matbematical  calculations)  and  its 
shortcomings.  Tbe  following  three  design  criteria  have  been 
suggested  for  knowledge-based  systems  in  other  areas,  and 
we  feel  they  are  a  good  guide  for  what  should  be  expected 
from  current  knowledge- bated  systems. 

First,  the  tystem  should  provide  an  integrated,  user- 
friendly  environment.  For  instance,  a  schematic  should 
only  be  entered  once,  and  tbe  system  should  be  able 
to  check  it  against  the  layout  or  any  other  constraints. 
Though  this  is  mostly  for  tbe  users’  benefit,  we  have  ob- 
Krved  that  any  operation  a  user  might  want  to  perform 
eventually  will  be  required  by  a  program.  Integrated 
facilities  greatly  ease  later  development  of  software  and 
design  tools. 

Second,  the  system’s  internal  semantics  must  match 
those  of  the  final  user  as  closely  as  possible.  The  internal 
routines  must  be  able  to  deal  with  concepts  like  equilibrated 
differential  signals,  nodes  that  need  to  be  bootstrapped  and 
tow  impedance  outputs.  If  these  semantics  are  not  used 
then  it  is  very  likely  that  inconsistencies  will  creep  into  tbe 
system.  These  inconsistencies  will  be  hard  to  rationalize  to 
human  designers. 

In  the  same  way,  it  it  important  to  let  the  designer 
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provide  inforinaitoD  lo  whatever  form  is  most  appropriate 
for  the  problem.  Ofteo  the  way  the  informatioo  is  specified 
cao  be  used  to  guide  later  computational  strategies  (the 
system  knows  what  the  user  feels  is  important).  For 
instance,  in  specifying  an  inverter  the  designer  should 
only  need  to  speciiy  a  few  of  the  following  interrelated 
parameters:  pullup/pulldown  transistor  width  and  length, 
pullup/pulldown  shape,  DC  gain,  mverter  trip  point,  low 
output  DC  current,  noise  margins  and  rise/fall  time  for 
a  given  load.  Existing  systems  require  the  designer  to  to 
specify  transistor  widths  and  lengths.  Often  the  designer  is 
more  concerned  with  one  of  the  electrical  parameters,  like 
DC  current,  in  this  case,  the  designer  should  not  be  re* 
quired  to  specify  transistor  sizes  to  achieve  the  DC  current 
parameur,  but  the  DC  current  should  be  specified  directly, 
while  the  system  computes  the  transistor  sizes.  The  in* 
verter  operates  properly  if  these  electrical  parameters  are 
achieved,  somethmg  the  system  can  check  and  maintain. 

Third,  the  system  must  be  able  mform  the  designer  of 
the  basis  for  its  results.  It  must  be  able  to  provide  the 
designer  with  the  reasons  why  circuits  have  the  topologi* 
cal  or  physical  structure  they  have  and  why  certain  struc* 
tures  were  rejected.  If  the  system  generates  a  surprising  or 
nnusual  circuit,  it  is  important  that  the  designer  be  able 
to  determine  the  system's  rationale  if  she/he  is  to  nse  it 
confidently. 

These  principles,  especially  the  last  two,  lead  to  a 
different  son  of  design  st’stem  from  that  found  in  current 
silicon  compilers  or  other  synthesis  tools  that  are  being 
developed.  When  design  fragments  are  synthesized,  addi* 
tional  information  must  be  produced  to  explain  the  pur* 
poses  of  the  design's  components.  This  additional  infor* 
mation  is  nearly  always  missing  in  silicon  compilers,  which 
generally  only  produce  a  final  layout  or  circuit  diagram. 
Consequently  it  is  difiSeult  for  designers  to  modify  the 
design  if  it  doesn't  meet  the  desired  performance  levels  or 
if  it  fails  to  meet  the  specification  for  some  other  reason. 
Since  design  is  an  evolutionary  process,  these  sorts  of 
modifications  are  inevitable. 

Design  Synthesis  in  SCHEMA 

A  large  number  of  tools  that  help  synthesize  designs 
are  already  in  common  nse.  These  range  from  simple 
PLA  generators  to  datapath  generators  and  silicon  com* 
pilers.  All  of  them  convert  a  design  specification  into  final 
artwork  or  a  final  circuit  diagram.  If  the  design  does  not 
behave  as  desired,  then  it  is  necessary  to  modify  the  original 
specification  and  apply  the  synthesis  tool  again.  There  are 
two  major  flaws  with  this  approach. 

First,  designs  evolve.  The  goals  for  a  module  are  con* 
stant'y  changing,  and  the  design  must  be  modified.  If  the 
design  is  directly  modified  then  all  the  expertise  contained 
m  the  synthesis  tool  is  lost.  Instead,  the  change  must 
be  made  to  the  design's  specification.  Consequently,  the 


Figure  1:  Proposed  Synthesis  Process  in  SCHEMA 

specification  language  must  be  able  to  deal  with  all  con¬ 
ceivable  types  of  constramts  on  the  final  design. 

Second,  if  the  synthesis  tools  are  to  produce  high  per* 
formance  designs  they*  must  incorporate  the  functional  re* 
quirements  provided  in  the  specification  and  the  low  level 
deuiis  that  affect  performance  (subtbreshold  conduction, 
junction  leakages,  etc.).  All  of  these  details  must  be  joggled 
in  producing  the  final  design.  Developing  such  an  omni* 
scient  piece  software  can  be  quite  difficult.  We  don't  expect 
human  experu'  first  try  to  be  completely  correct.  They 
are  expected  to  nse  simulators  and  other  analysis  tools  to 
determine  functional  and  performance  problems.  It  is  im* 
portant  that  analysis  tools  be  carefully  coupled  with  the 
next  generation  of  synthesis  tools  and  silicon  compilers. 

The  model  of  synthesis  used  in  SCH£M.\  is  shown 
in  figure  1.  The  synthesis  process  uses  two  modules  and 
relies  on  analysis  tools  contained  in  SCHEMA.  The  first 
module  converts  a  design  specification  into  a  first  cut  at 
the  design.  This  initial  guess  is  analyzed  and  all  deviations 
from  specified  behavior  are  noted  and  fed  to  a  correction 
module.  The  correction  module  modifies  the  design,  in  an 
effort  to  improve  its  behavior.  The  modified  design  is  then 
reanalyzed  and  recorrecied,  iteratively  converging  on  an 
acceptable  final  design. 

The  designs  produced  as  initial  ^iproximations  by 
SCHEMA  must  contain  more  than  just  the  topology  and 
device  sizes  of  the  circuit.  The  correction  module  needs  to 
know  the  circuit's  desired  behavior  and  each  device's  con¬ 
tribution.  This  information  is  the  behavioral  descriptioa 
of  the  circuit.  The  behavioral  description  is  currently  or¬ 
ganized  as  a  collection  of  voltage  and  current  signals  at 
different  nodes  of  the  topology  and  causal  information  con¬ 
nected  to  the  parameters  of  these  signals.  The  signals  tbem- 
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selves  sr;  represeoted  is  sequences  of  levels  lad  ramps, 
u'bicb  seems  adequate  for  describing:  ibc  behavior  of  most 
digital  circuits.  The  causal  mformation  is  organized  as  con¬ 
straints  that  relate  signal  parameters  to  parameters  of  ibe 
topolog> .  Many  of  these  constraints  are  easily  derived  from 
KirchofTs  current  and  voltage  laws  and  the  device  models. 
Others  capture  cliches  or  “engineering  models”  that  arc  use¬ 
ful  for  doing  rough  analt'ses  of  circuits.  An  example  of  this 
behavioral  descripuon  and  bow  it  is  used  is  given  below. 

Simulators  produce  all  of  the  details  of  how  a  circuit 
works,  but  it  takes  time  and  effort  to  determine  if  the 
waveforms  it  produces  are  what  the  designer  wanted.  The 
analysis  tools  will  need  to  be  more  than  just  simulators. 
The  analysis  tools  planned  for  SCHE\L\  will  make  use  of 
conventional  simulators  but  they  are  intended  to  answer  the 
questions  the  designer  wants  answered.  They  will  either 
indicate  that  the  design  puses  all  the  specified  tests  or, 
if  it  doesn't,  what  portions  of  the  circuit  fail  and  hope¬ 
fully  give  some  clues  as  to  what  the  possible  causes  of  the 
failure  are.  The  tests  that  are  applied  to  the  design  come 
from  both  the  design  specification  and  also  the  behavior 
specifications  of  the  library  modules  that  were  used.  For 
instance,  the  s>'stem  should  verify  that  the  noise  margins 
of  the  logic  gates  remain  adequate  under  the  test  situation 
and  that  bootstrapped  nodes  actually  do  bootstrap  u  high 
u  desired. 

The  correction  module  of  figure  1  will  initially  be  the 
human  designer,  but  it  is  clear  that  if  the  analysis  tools 
yield  the  type  of  information  described  above  it  will  be 
euier  to  build  correction  modules  to  handler  the  simpler 
adjustmenu.  In  other  situations  circuit  optimizers  may 
prove  to  be  the  most  effective  means  of  adjusting  the  design. 


Behavioral  Circuit  Descriptions 
The  bootstrapped  AND  gate  shown  in  figure  2  is  an 
example  of  a  circuit  that  is  extremely  useful,  but  re¬ 
quires  some  care  when  used.  We  envision  an  inexperienced 
designer  using  the  circuit  as  it  comes  from  a  library. 
SCHEM.\’s  analysis  tools  notices  when  the  circuit  is  being 
used  improperly  (based  on  its  behavioral  description)  and 
warns  the  designer.  The  following  paragraphs  describe  a 
portion  of  this  circuit's  behavior  description. 

The  signals  used  for  the  behavioral  description  are  con¬ 
structed  from  simple  levels  and  ramps  as  shown.  Here  we 
are  only  concerned  with  the  portion  of  the  model  that  deals 
with  the  final  high  voltage  of  the  output.  The  parasitic 
capacitance  on  node  B  is  denoted  by  Cp,  and  we  denote 
the  gate-to-channel  capacitance  by  r».  For  simplicity,  as¬ 
sume  both  capacitances  are  constant.  Then  the  final  output 
voltage  will  be 

VoH  *  min(V,H,  Vbz  —  Vy). 

B  s  bootstrap  voltage  is  denoted  by  o  ■*  Vsj  —  and  is 
controlled  by  the  constraint 


Figure  2:  Dynamic  AND  Gate 


and  the  voltage  V^j  is  constrained  to  be  Vah  —  Vy. 

These  constraints  tie  the  circuit’s  expected  internal 
voltages  to  the  external  stimuli,  the  transistor’s  threshold 
voltages  and  two  capacitances  Cp  and  Cs.  If  the  output 
of  this  circuit  is  other  than  expected,  the  system  can  com¬ 
pare  expected  with  observed  internal  waveforms,  isolate 
differences  and  propagate  these  differences  through  the  con¬ 
straints  to  see  what  physical  parameters  are  incorrect.  If 
the  output  were  lower  than  expected,  and  only  Vg]  is  some¬ 
what  low,  then  only  the  last  constraint  (for  o)  would  fail 
to  hold.  Since  VpH  and  are  both  as  expected  we  must 
conclude  that  there  is  some  problem  with  the  capacitances. 

This  type  of  model  and  reasoning  process  seems  to 
be  adequate  for  describing  most  first-order  phenomena,  if 
care  is  taken  when  matching  observed  transient  signals  to 
the  piece-wise  linear  signals  used  in  the  model.  It  narrows 
the  circuit  problems  to  manageable  size  and  provides  the 
designer  with  guidance  in  correcting  the  problem. 

Conclusions 

We  have  discussed  some  the  ways  in  which  a  VLSI 
design  system  can  help  deal  with  problems  of  performance 
in  design  and  have  outlined  some  of  SCHEMA’S  features 
that  incorporate  these  ideas.  Also,  some  general  criteria 
for  expert  systems  and  how  these  criteria  relate  to  AXSI 
design  ^-sterns  were  discussed.  The  key  guideline  in  this 
endeavor  has  been  to  try  to  ascertain  what  information  the 
designer  really  wants  to  know  and  develop  a  system  that 
can  provide  the  information. 

This  work  was  supported  by  DARPA  grant  N0014-80- 
C-0622. 
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Well  organised  large  systems  tend  to  consist  of  a  large 
number  of  small  pieces  of  code  each  of  which  captures' 
a  single  semantic  unit.  These  pieces  of  code  are  strung 
together  to  form  larger  semantic  phrases,  which  are  in  turn 
components  of  even  larger  phrases.  The  smallest  semantic 
units  in  a  polynomial  manipulation  system  might  be  the 
routines  that  add  and  multiply  the  coeflBcients  of  the  poly¬ 
nomials.  These  routines  are  combined  (used  as  subroutines) 
to  form  the  routines  that  add  and  multiply  polynomials, 
which  are  components  in  the  factoring  and  greatest  com¬ 
mon  divisor  routines. 

When  a  system  is  built  in  a  top  down  manner,  the 
larger  phrases  are  formed  first  and  are  used  to  define  the 
semantic  components  of  the  smaller  phrases.  Bottom  up 
software  design  begins  with  the  small  phrases  and  generates 
the  large  phrases.  In  practice  a  combination  of  these  two 
approaches  is  often  used.  The  manner  in  which  the  routines 
are  initially  connected  is  usually  simple,  but  in  time  the 
addition  of  new  capabilities  and  features,  and  the  neces¬ 
sities  of  performance  enhancement  generally  cause  the  de¬ 
pendency  structure  to  become  quite  complex. 

A  good  example  of  this  sort  of  complexity  is  when 
a  new,  “higher  performance”  representation  of  tome  data 
structure  is  introduced  for  critical  uses  (caching  and  hash 
tables  examples  of  this  optimization).  In  building  an 
input/output  system  for  a  computer,  we  might  initially 
specify  a  stream  to  be  a  simple,  character  at  a  time  struc¬ 
ture.  When  this  structure  it  used  for  file  operations  and 
networking  it  becomes  necessary  to  add  buflering,  but  it 
would  be  unwise  to  use  the  buflered  stream  for  terminal 
I/O.  These  two  types  of  streams  could  share  large  amounts 
of  code  if  the  manner  in  which  the  lower  level  routines  are 
“glued  together”  is  sufficiently  powerful.  For  instance,  the 
only  difference  between  the  routines  which  close  the  stream 
it  that  the  buffered  stream  must  fiush  its  buffers  and  return 
them  to  the  buffer  pool. 


We  are  particularly  interested  in  the  “gluing  together” 
process  which  is  used  to  form  large  systems.  In  this  p^r 
we  will  describe  a  system  called  Capsules  which  we  feel 
provides  a  more  natural  and  more  powerful  combination 
mechanism  than  discussed  previously.  This  system  was 
originally  an  attempt  to  simplify  the  construction  of  an  al¬ 
gebraic  manipulation  system,  but  we  are  now  applying  it  to 
tile  development  of  a  VLSI  design  system  and  investigat¬ 
ing  its  utility  in  organizing  the  I/O  system  of  a  complex 
personal  computer. 

1.  Philosophy 

In  most  systems,  when  a  piece  of  code  is  written  it 
is  given  a  name.  In  the  earliest  programming  languages 
(Fortran,  Basic,  Lisp  1.5),  When  the  user  wants  to  per¬ 
form  some  operation  (like  pushing  an  element  on  a  stack 
or  outputting  a  character),  it  it  necessary  to  find  a  piece 
of  code  that  implements  the  desired  operation  and  refer  to 
it  using  its  name.  In  some  systems  (CLU  [LisTT],  Flavors 
[WeiSl],  Loops  [Bob82],  Smalltalk  [lng76,  XerSl])  an  extra 
level  of  indirection  is  introduced  that  allows  the  binding  of 
tile  piece  of  code  to  an  operation  name  to  be  delayed  until 
after  the  code  is  written.  This  ^iproacb  has  been  called 
data  Mbstraction.  In  compile-time  languages  like  CLU,  the 
association  of  the  code  with  the  abstract  operation  is  made 
at  compile  or  link  time.  The  Lisp  and  Smalltalk  versions  of 
these  approach  delay  the  binding  until  runtime.  In  either 
case,  an  extra  level  of  indirection  has  been  provided  between 
the  name  representing  an  abstract  operation  and  the  piece 
of  code  that  implements  that  abstraction.  There  is  still  no 
tie  between  the  abstract  operation  as  a  semantic  unit  that 
the  user  wants  to  use  and  the  piece  that  implements  the 
operation. 

In  the  Capsule  system,  the  user  specifies  the  desired 
behavior  of  the  operation  and  the  system  is  responsible  for 
finding  the  piece  of  code  that  implements  that  operation 
and  is  compatible  with  previous  constraints.  If  a  more 
efficient  piece  of  code  is  written  that  implements  some 
operation,  the  system  will  use  the  more  efficient  code  u 
long  as  it  meets  the  user's  specifications.  This  is  a  result  of 
(1)  the  user  referring  to  code  fragments  by  their  semantic 
purpose  rather  than  their  name,  and  (2)  the  system  being 
responsible  for  matching  the  the  semantic  requests  with  the 
code  fragments  in  the  system. 
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3.  Capralti 

Is  tlie  csptnle  lyiUin  yn  hsve  asramed  that  all  actions 
occur  by  sesding  messages  to  objects — this  is  the  object 
oriented  viewpoint.  Taking  the  dual  point  of  view,  where 
objects  are  passive  and  the  correct  function  is  chosen  by 
the  compiler,  merely  moves  the  mechanism  we  are  discuss¬ 
ing  into  the  compiler.  This  is  the  fundamental  difference 
between  the  Lisp  Machine’s  flavor  system  which  takes 
the  object  oriented  viewpoint,  and  CLU  which  is  function 
oriented. 

By, an  object  we  will  mean  something  to  which  a  mes¬ 
sage  can  be  sent.  This  will  result  in  one  (or  more  values) 
being  returned  and  the  internal  state  of  ^e  object  being 
changed.  The  action  caused  when  a  message  is  sent  to  an 
object  is  called  an  operation.  A  message  is  a  string  used  as 
the  name  of  some  operation.  It  contains  no  internal  struc¬ 
ture.  The  piece  of  code  executed  when  a  message  is  sent 
to  an  object  is  called  a  method.  ObjecU  may  also  contain 
internal  state  which  is  kept  in  instance  variables  that  may 
be  referenced  by  the  methods. 

Every  object  belongs  to  a  class  of  equivalent  objects 
that  have  the  same  methods  and  the  same  set  of  instance 
variables.  This  equivalence  class  is  called  a  collage.  Ob¬ 
jects  are  created  by  calling  the  function  HAKE-OBJECT  on  a 
collage.  The  methods  are  actually  part  of  the  collage,  so 
as  operations  are  added  to  the  collage,  the  objects  of  the 
collage  also  acquire  them. 

The  specification  for  how  a  method  is  to  be  constructed 
is  kept  in  a  structure  called  a  capsule.  When  a  capsule  is 
added  to  a  collage,  the  code  within  the  capsule  is  incor¬ 
porated  in  one  or  more  of  the  methods  of  the  collage. 
soles  also  contain  information  describing  what  their  pieces 
of  code  expect  of  the  collage  to  which  they  are  added  (what 
operations  and  instance  variables  there  are,  for  instance). 

The  design  of  the  capsule  system  was  based  on  our 
experience  with  some  very  large  software  systems,  and  it  is 
in  the  construction  of  large  systems  that  its  power  is  most 
apparent.  The  following  paragraphs  use  a  small  example 
to  explain  the  mechanisms  and  terminology  of  the  Capsule 
system.  As  such,  the  Capsule  mechanisms  may  seem  to  be 
overkill.  The  reader  is  asked  to  treat  this  small  example 
as  what  it  is,  and  to  map  the  capsule  mechanisms  onto 
whatever  large  software  system  is  familiar. 

The  small  example  we  use  is  the  implementation  of 
a  stack.  A  stack  is  an  object  that  accepts  two  messages, 
PUSH  and  POP.  These  messages  have  the  obvious  meaning. 
We  will  additionally  introduce  an  operation  called  TWIDDLE 
which  interchanges  the  top  two  elements  of  a  stack.  Two 
implementations  are  given  for  stacks.  One  uses  a  list  to 
implement  the  stack,  while  the  other  uses  an  array. 

An  operation  is  the  specification  of  an  action.  It  in¬ 
cludes  specifications  for  the  number  and  type  of  argumenU 
and  return  values  as  well  as  a  specification  of  what  the  ac¬ 
tion  will  accomplish,  called  the  semantics  of  the  operation. 


The  current  implementation  does  not  interpret  the  seman¬ 
tics  in  any  manner.  The  semantics  fields  are  checked  for 
equality  to  ensure  that  two  operations  perform  the  same 
action. 

To  implement  our  stack  example,  the  first  thing  we 
need  to  do  is  specify  the  operations  that  will  be  used,  PUSH, 
POP  and  TWIDDLE. 

(DEFOPERATION  PUSH 
(ARGUMENTS  NIL)  - 

(RETURNS) 

(SEMANTICS  PUSB-ELEMENT-ON-STACK) 
(DOCUMENTATION  'Adds  an  olonent  to  the  top 
of  a  stack")) 

The  ARGUMENT  field  indicates  that  PUSH  takes  exactly 
one  additional  argument,  its  type  is  unspecified.  No  values 
are  returned.  The  semantics  field  has  the  atom  PUSH- 
ELEMENT-ON-STACK  in  it.  Since  the  the  semantics  field  is 
not  really  interpreted  by  the  current  version  of  this  system, 
this  atom  is  used  as  a  place  holder.  We  will  leave  out  the 
semantics  fields  in  the  following  examples.  The  documen¬ 
tation  string  is  used  by  the  run-time  documentation  system. 

(DEFOPERATION  POP 
(RETURNS  NIL) 

(DOCUMENTATION  "Renoves  and  returns  the  top 
elenent  of  a  stack*)) 

(IS^OPERATION  TWIDDLE 

(DOCUMENTATION  "Exchanges  the  top  tso  ele- 
nenta  of  a  stack*)) 

The  default  assumptions  ate  that  an  operation  takes 
no  arguments  and  returns  no  values.  These  assumptions 
are  used  in  the  specifications  of  POP  and  PUSH. 

Protocol’s  are  used  to  specify  the  characteristics  of  a 
collage.  A  protocol  is  a  list  of  (1)  operations  (including  their 
semantics),  (2)  axioms,  which  specify  relationships  among 
operations,  (3)  instance  variables,  and  (4)  attributes,  which 
are  other  characteristics.  The  following  protocol  captures 
the  notion  of  a  stack. 

(DEFPROTOCOL  BASIC-STACK 
(OPERATIONS  PUSH  POP) 

(AXIOM 

(STACK-POSH-POP-AXIOM  PUSH  POP))) 

That  is,  a  stack  accepts  two  operations,  PUSH  and  POP  (as 
described  above)  and  these  two  operations  obey  the  STACK- 
PUSH-POP-AXIOM,  which  means  that  a  PUSH  followed  by 
a  POP  returns  the  value  originally  pushed  and  all  the  in¬ 
ductive  variations  of  that  statement.  As  with  the  seman¬ 
tics  portions  of  operation  specifications,  the  current  system 
does  not  attempt  to  interpret  the  axioms  in  more  primitive 
terms,  but  treats  them  atomically. 

The  following  slightly  more  complex  protocol  illustrates 
how  the  mathematical  abstraction  of  an  algebraic  ring  may 
be  specified. 
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(DCFPROTOCOL  RUG 

(qpbutiors  plus  minus  zero 

TINES) 

(AXIOMS 

(COMMUTATIVE-Uf  PLUS) 

(ASSOCIATIVE-LAff  PLUS) 

(ASSOCIATIVE-UV  TIMES) 
(ALGEBRAIC-IDENTITY  PLUS  ZERO) 

-  (ALGEBRAIC- INVERSE  PLUS  MINUS) 
(DISTRIBUTIVE-UI  PLUS  TIMES)) 

(ATTRIBUTES 

(CHARACTERISTIC))) 

It  is  assumed  that  the  DEFOPERATIONs  for  the  specified 
operations  appear  elsewhere.  This  specification  indicates 
that  there  is  a  ZERO  operation  that  retnms  the  additive 
identity.  An  alternative  implementation  might  reqaire  ZERO 
to  be  an  instance  variable.  Any  collage  that  adheres  to  this 
protocol  is  an  abstract  ring.  Any  piece  of  code  tbat  depends 
only  on  this  protocol  can  be  added  to  an  abstract  ring.  This 
is  somewhat  closer  to  the  mathematical  understanding  of 
abstraction  than  previous  systems. 

The  DEFOPERATION  form  defines  a  protocol  that  eon* 
tains  a  single  operation  and  nothing  else.  The  name  of 
this  protocol  is  ^e  tame  at  the  name  of  the  operation  nn* 
less  specified  otherwise.  This  conveniently  allows  the  use  of 
operation  names  and  protocols  interchangeably. 

AH  code  it  put  into  capsules.  C^tules  consist  of  font 
basic  parts,  (1)  a  required  protocol,  what  the  capsule  ex* 
pecu  of  the  collage  it  it  to  be  added  to,  (2)  an  asserted 
protocol,  things  to  add  to  the  protocol  of  a  collage  when 
the  capsule  it  added,  (3)  performance  information  about 
the  algorithm  contained  in  the  capsule,  and  (4)  the  code 
itself.  It  often  happens  that  more  than  one  capsule  could 
be  added  to  a  collage  to  satisfy  some  requirement.  The 
performance  information  is  used  to  break  ^ose  deadlocks. 

In  order  to  allow  incremental  compilation  and  debug¬ 
ging,  the  specification  of  a  capsule  it  separated  into  two 
pieces.  A  DEFCAPSULE  form  is  used  to  indicate  the  first  three 
parts  of  the  specification  while  separate  OEFALGORITRM 
forms  are  used  for  each  piece  of  code.  (In  the  terminol* 
ogy  used  in  the  flavor  system,  capsules  are  extensions  of 
flavors,  and  algorithms  are  methods.  Our  algorithms  can 
be  more  complex  than  the  simple  pieces  of  code  that  flavor 
methods  must  be,  but  it  would  take  us  too  far  afield  to  dis¬ 
cuss  these  capabilities  here.)  The  following  implementation 
of  the  BASIC-STACX  protocol  illustrates  this. 

(DEFCAPSUU  LIST-STACK 
(ASSERTS 

(PROTOCOL  BASIC-STACK) 

(INSTANCE-VARIABLE  (STACK  ())) 

(ATTRIBUTE  STACK-IMPLEMENTED-AS-LIST))) 

(OEFALGORITHM  (LIST-STACK  PUSH)  (ELEMENT) 
(SETQ  STACK  (CONS  ELEMENT  STACK))) 

(OEFALGORITHM  (LIST-STACK  POP)  () 

(PROGl  (FIRST  STACK) 

(SETQ  STACK  (REST  STACK)))) 


The  LIST-STACK  capsule  implements  a  stack  in  terms 
of  a  list  of  the  elements  of  the  stack.  It  makes  no  assump¬ 
tions  of  the  collage  to  which  it  is  to  be  added.  When  it 
.is  added  to  a  collage,  the  BASIC-STACK  protocol  is  added, 
along  with  an  instance  variable  STACK  and  the  two  pieces  of 
code  given  in  the  OEFALGORITHM.  In  addition,  an  attribute  is 
added  to  the  collage  that  indicates  the  .stack  is  implemented 
using  a  list. 

The  capsule  that  implements  stacks  in  terms  of  arrays, 
is  quite  similar  (we  have  ignored  the  problem  of  running  off 
the  end  iff  the  array  for  simplicity  here). 

(DEFCAPSULE  ARRAY-STACK 
(ASSERTS 

(PROTOCOL  BASIC-STACK) 

(INSTANCE-VARIABLES 
(STACK  (MAKE-ARRAY  ’(100))) 

(INDEX  0)) 

(ATTRIBUTE  STACK-IMPLEMENTED-BY-ARRAY))) 

(OEFALGORITHM  (ARRAY-STAa  PUSH)  (ELEMENT) 
(SETF  (AREF  STACK  INDEX)  ELEMENT) 

(SETQ  INDEX  INDEX  1))) 

(OEFALGORITHM  (ARRAY-STACK  POP)  0 
(SETQ  INDEX  (-  INDEX  1)) 

(AREF  STACK  (.*  INDEX  1))) 

The  function  MAIS-COLLAGE  it  used  to  create  collages. 
It  takes  an  arbitrary  number  of  arguments,  each  of  which 
is  either  a  capsule  or  the  name  of  a  capsule,  or  a  protocol 
or  the  name  of  a  protocol.  Thus  the  following  forms  can 
be  used  to  construct  stack  collages  of  the  two  type  defined 
thus  far. 

(SETQ  Cl  (MAKE-COLUGE  ’LIST-STACK)) 

(SETQ  C2  (MAKE-COLUGE  ’ARRAY-STACK)) 

The  fonn  (MAKE-COLUGE  ’BASIC-STACK)  would  result  in 
an  error  because  it  is  ambiguous.  There  are  two  capsules 
d>at  can  be  used  to  create  a  collage  with  the  BASIC-STACK 
protocol  and  there  is  no  reason  to  prefer  one  over  the  other. 
(In  this  situation  we  have  seriously  considered  just  picking 
the  first  capsule.  The  user  hasn't  given  any  reason  to  prefer 
one  capsule  over  the  other  so  why  not  pick  one  at  random?) 

Once  we  have  a  couple  of  collages  to  work  with,  we 
can  create  stacks  using  the  MAKE-OBJECT  function.  Its  first 
argument  is  either  a  collage  or  the  name  of  one.  Additional 
arguments  are  passed  to  the  initialisation  method  if  there 
is  one.  The  following  forms,  create  a  stack  from  the  collage 
Cl  and  push  two  elements  onto  it. 

(SETQ  STACK  (MAKE-OBJECT  Cl)) 

(SEND  STACK  ’PUSH  1) 

(SEND  STACK  ’PUSH  3) 

Now,  (SEND  STACK  ’POP)  wUl  return  2. 

Though  we  have  defined  what  is  meant  by  the  TWIDDLE 
operation,  no  capsule  implements  it.  The  following  capsule 
provides  an  algorithms  that  “TVIDDLEs"  the  top  two  ele¬ 
ments  of  an  abstract  stack. 
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(ISFCAPSOU  BASIC-TfllffiLE 
(REVIXRES  BASIC-STACK) 

(PERFORMANCE  1) 

(ASSERTS 

(PROTOCOL  TilDDLE))) 

(I^ALGORITHH  (BASIC-TWIDDLE  TWIDDLE)  () 
(LET  (TOP  SECOND) 

(SETQ  TOP  (SEND  SaF  'POP) 

SECOND  (SEND  SELF  ’POP)) 

(SEND  SELF  'PUSH  TOP) 

(SEND  SELF  'PUSH  SECOND))) 

Tbit  capsule  has  a  required  protocol,  BASIC-STACK. 
Thus  it  eaa  only  be  combined  with  collages  that  already 
possess  the  POSH  and  POP  operations.  It  can  be  added  to 
any  abstract  stack. 

The  routine  ADD-PROTOCOLS  is  used  to  add  protocols 
to  collages.  Its  Brst  argument  is  a  collage  and  the  rest  of 
its  arguments  are  protocols  that  the  user  wants  the  collage 
to  meet.  Thus  the  form 

(ADD-OPERATION  Cl  'TWIDDLE) 

adds  a  TWIDDLE  operation  to  Cl.  More  precisely,  each 
collage  contains  a  table  that  gives  the  relationships  between 
message  names  and  pieces  of  code.  Each  object  constructed 
from  a  collage  contains  a  pointer  to  this  table.  When  an 
operation  it  added  to  a  collage,  the  system  isolates  a  capsule 
that  both  provides  the  desired  operation  and  which  can  be 
added  to  the  collage.  The  code  portion  of  the  capsule  it 
then  added  to  the  collage’s  method  table.  Thus  all  objects 
of  the  collage  are  now  extended  with  the  new  operation. 

It  is  easy  to  define  a  slightly  more  eflicient  version 
of  TWIDDLE  for  arrays.  The  ARRAY-TWIDDLE  capsule  does 
precisely  this. 

(DEFCAPSUU  ARRAY-TWIDDU 
(REQUIRES 

(ATTRIBUTE  STAOC-IMPLEMENTED-BY- ARRAY)) 
(PERFORMANCE  2) 

(ASSERT 

(PROTOCOL  TWIDOU))) 

(DEFALCORITHM  (ARRAY-TWIDDLE  TWIDDLE)  () 

(LET  ((TEMP)) 

(SETQ  TEMP  (AREF  STACK  (-  INDEX  1))) 
(SETF  (AREF  STACK  (-  INDEX  D) 

(AREF  STACK  (-  INDEX  2))) 

(SETF  (AREF  STACK  (-  INDEX  2))  TEMP))) 

Notice  that  this  capsule  does  not  actually  um  the  PUSH 
and  POP  operations.  It  only  assumes  that  there  are  instance 
variables  STACK  and  INDEX,  and  that  they  can  be  inter* 
preted  to  form  a  stack.  This  is  the  purpose  of  the  STACK- 
IMPLEMENTED-BY-ARRAY  attribute. 

With  this  capsule  added  to  Uie  system,  adding  the 
TWIDDLE  operation  to  an  ARRAY-STACK  collage  will  get  the 
new  code,  while  previously  it  would  have  used  the  routine 
in  BASIC-TWIDDLE. 

As  a  final  note,  if  a  message  is  seat  to  an  operation  that 
does  not  possess  a  handler  for  that  message  then  a  default* 
handler  is  run.  One  of  the  default  handlers  with  which 


we  have  been  experimenting  attempts  to  add  the  desired 
operation  to  the  object's  collage  and  then  tries  Thus 
if  a  stack  did  not  have  a  TWIDDLE  handler,  a  TWIIffiLE  mes- 
•sage  could  be  sent  to  it  anyway,  since  one  could  be  created 
.for  it  and  installed  on  the  fly.  If  the  stack  was  an  ARRAY- 
STACK  then  the  efficient  ARRAY-TWIDDLE  capsule  would  be 
added,  otherwise  the  BASIC-TWIDDLE.caa  be  used. 

There  are  two  points  to  notke^about  this  scenario. 
First,  when  new  functionality  was  added  to  a  collage,  the 
user  specified  only  what  the  desired  semantics  were  and  did 
not  specify,  directly  or  indirectty,  a  particular  piece  of  code. 
Second,  the  functionality  could  tw  added  dynamically  while 
the  system  is  running.  In  some  domains,  there  are  several 
algorithms  for  performing  an  operation  and  it  can  be  very 
expensive  to  decide  which  to  use.  It  is  better  not  to  pay 
that  price  until  it  is  truly  necessary. 

S.  Conclnsieas 

When  the  capsule  system  was  used  to  describe  a  portion 
of  the  stream  code  used  in  the  LISP  Machine,  we  noticed 
that  the  protocol  specifications  seemed  more  verbose  than 
we  would  have  liked.  Closer  examination  revealed  that 
many  of  the  comments  we  had  penciled  into  the  version 
of  the  code  that  used  flavors  were  being  translated  into 
protocols.  This  reinforces  our  impression  that  the  capsule 
system  is  partially  an  attempt  to  force  the  programmer  to 
make  the  code  that  is  written  more  precise. 

We  feel  that  if  the  programmer  makes  this  efibrt,  the 
programming  tystem  will  be  in  a  much  better  position  to 
aid  in  the  development  of  large  software  tystems.  The 
Capsule  system  is  a  partially  successful  attempt  to  provide 
mechanism  through  which  the  programmer  «•«»»  truly  ex* 
press  what  the  program  is  intended  to  do. 
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graph  and  the  cube-connected-cycies)  have  long  been  known  to  be  powerful 
networks  for  parallel  computation.  Recently,  an  entirely  different  class  of 
networks  has  been  discovered  that  appears  to  rival  the  hypercube-based  networks 
in  usefulness.  In  this  paper,  we  describe  the  new  networks  (which  we  call  meshes 
of  trees),  and  we  show  how  they  can  be  used  to  solve  problems  such  as  sorting, 
matrix-vector  multiplication,  discrete  Fourier  transform,  transitive  closure, 
minimum  spanning  tree,  integer  multiplication  and  matrix  multiplication  in 
0{logn)  or  Qilog^n)  steps. 

1.  Introduction 

Graphs  such  as  the  hypercube,  the  shuffle-exchange  graph  [S71,  S80]  and  the 
cube-connected-cycles  [PV79]  have  long  been  known  to  be.  very  powerful 
networks  for  parallel  computation.  In  fact,  most  of  the  fast  parallel  algorithms 
known  for  problems  such  as  sorting  and  discrete  Fourier  transform  are  based  on 
the  unique  structure  of  these  networks. 

Recently,  an  entirely  different  class  of  networks,  h?-  been  discovered  that 
appears  to  rival  the  hypercube-based  networks  in -their  usefulness.  The  new 
networks  are  known  by  a  variety  of  names  (including  the  orthogonal  trees  and  the 
orthogonal  forests),  but  we  call  them  meshes  of  trees.  The  structure  inherent  in 
meshes  of  trees  can  be  found  in  algorithms  that  are  up  to  ten  years  old,  but  the 
networks  themselves  have  only  recently  been  defined.  Three  groups  of 
researchers  are  responsible  for  independently  formalizing  the  definition  of  a  mesh 
of  trees: 

1)  Nath,  Maheshwari  and  Bhatt  (N82,  NMB83]  showed  how  the  networks 
could  be  used  for  sorting,  discrete  Fourier  transform,  transitive  closure 
and  minimum  spanning  tree. 


2)  Cappello  and  Stcigliiz  [CS81]  showed  how  the  networks  could  be  used 
for  integer  multiplication,  and 

3)  Leighton  [L81,  L831  obtained  optimal  boi.:..ds  for  laying  out  the 
networks  on  a  VLSI  chip  and  showed  how  the  networks  could  be  used 
for  sorting,  discrete  Fourier  transform  and  matrix  multiplication. 

In  this  paper,  we  describe  algorithms  for  all  of  these  problems.  In  some  cases 
(such  as  for  matrix  multiplication  [PV80,  L83]  and  integer  multiplication  [CS81]), 
the  algorithms  were  known  previously  and  we  have  included  them  for 
completeness.  In  other  cases  (such  as  for  sorting  and  transitive  closure),  the 
algorithms  are  new  and  consume  less  of  some  resource  (e.g.,  time,  area  or 
processor  size)  than  did  the  previously  known  algorithms.  Except  for  the  graph 
problems  (which  take  O(los^n)  steps),  all  of  the  algorithms  require  only  O(hgn) 
steps  to  execute. 

The  paper  is  divided  into  five  sections.  In  Section  2,  we  define  the 

2-dimensional  mesh  of  trees  and  attempt  to  provide  some  intuition  as  to  why  it  is 

a  good  network  for  parallel  computation.  We  also  review  the  VLSI  layout  results 

that  are  known  for  Ae  network.  In  Section  3.  we  show  how  the  n-by-n  mesh  of 

trees  can  be  used  to  sort  n  numbers  in  Oilogn)  steps.  We  also  show  how 

pipelining  can  be  used  to  decrease  processor  size,  increase  data  rate,  and  decrease 

layout  area.  In  Section  4,  we  describe  algorithms  for  matrix-vector  multiplication, 

Fourier  transform,  transitive  closure,  minimum  spanning  tree  and  integer 

multiplication.  In  Section  5,  we  discuss  multidimensional  meshes  of  trees  and 

show  how  the  n-by-n-by-/i  mesh  of  trees  can  be  used  to  multiply  two  n-by-n 

matrices  in  0(logn)  steps.  We  also  define  the  powerful  shuffle-tree  graph- and 

explain  why  it  can  efficiently  simulate  algorithms  designed  for  hypercube 

networks  as  well  as  those  designed  for  meshes  of  trees. 

% 

Throughout  the  paper,  we  assume  that  nodes  which  are  linked  by  an  edge  can 
communicate  in  a  single  time  step.  This  assumption  is  made  in  many  of  the 
papers  in  the  literature.  (An  exception  is  [NMB83],  which  assumes  logarithmic 
communication  time.)  If  longer  communication  times  are  required,  then  the 
number  of  steps  calculated  for  algorithms  in  this  paper  must  be  scaled  up 
accordingly. 

2.  The  2'Dimensiona]  .Mesh  of  Trees 

2.1  Definition  and  Properties 

The  2-dmensional  mesh  of  trees  M ^  „  is  constructed  as  follows.  Starting  with 
an  n-by-n  grid  of  nodes  (where  n  is  a  power  of  two)  and  adding  nodes  and  edges 
as  specified,  construct  a  complete  binary  tree  in  each  row  and  column  of  the  grid. 
The  trees  should  be  constructed  so  that  the  leaves  in  each  tree  arc  precisely  the 


nodes  in  ihe  corresponding  row  or  column  of  the  original  grid.  In  particular,  the 
(/,  j)  grid  node  (i.e.,  the  node  in  the  iih  row  and  jth  column  of  the  grid)  should 
double  as  the  idt  leaf  of  the  Jf/i  column  tree  and  the  jih  leaf  of  the  hh  row  tree. 
(The  iih  leaf  is  determined  by  counting  from  left  to  right  in  the  canonical  drawing 
of  the  complete  binary'  tree  in  the  plane.) 

As  an  example,  we  have  drawn  A/7^  in  Figure  1.  The  nodes  in  the  original 
4-by-4  grid  are  represented  by  dots.  The  nodes  that  were  added  to  form  row  trees 
are  drawn  as  small  triangles  while  those  added  to  form  column  trees  are  shown  as 
small  squares.  The  row  tree  edges  are  drawn  with  solid  lines  while  dashed  lines 
represent  column  tree  edges. 


Figure  1 :  The  4’by-4  mesh  of  trees  . 

It  is  not  difficult  to  show  that  the  n-by-n  mesh  of  trees  A/2  „  has  ^  =  3n^  •  2n 

nodes,  4n^  -  4n  edges  and  maximum  node  degree  three.  More  importantly,  the 
graph  has  a  small  diameter,  that  is,  every  pair  of  nodes  in  the  graph  can  be  linked 
by  a  path  of  length  4logn  =  Q{logN),  TTiis  means  that  any  pair  of  processors  in 
a  mesh  of  tfees  network  can  communicate  in  a  short  (logarithmic)  amount  of 
time,  an  indication  that  the  network  will  be  useful  for  parallel  computation. 

In  addition  to  having  a  small  diameter,  the  mesh  of  trees  also  has  a  nice 
recursive  structure.  For  example,  if  we  remove  the  roots  of  the  row  and  column 
trees  of  M2J  and  the  edges  incident  to  them,  we  left  with  four  copies  of 
quadrant  of  the  original  grid.  (For  example,  see  Figure  2.) 

In  general,  if  we  remove  the  nodes  and  edges  in  the  top  k  levels  of  the  binary 
trees  in  A/2.1,,  we  will  be  left  with  2^*  copies  of  A/2, ^2**  This  property  is 
important  for  two  reasons.  First,  it  means  that  the  mesh  of  trees  is  particularly 
well  suited  for  use  with  algorithms  that  are  based  on  the  divide-and-conquer 
paradigm.  (This  is  another  reason  why  the  mesh  of  trees  is  such  a  powerful 
network  for  parallel  computation.)  Second,  the  fact  that  A/2,,  can  be  decomposed 
into  four  disjoint  copies  of  M 2.11/2  removal  of  0(rr)  nodes  and  edges  means 

that  A/,,,  has  a  2^^‘^-bifurcator  of  size  0(/i).  (A  graph  is  said  to  have 
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Figure  2:  M 2^4  with  row  and  column  roots  removed. 


2^^^'bifurcator  of  size  F  if  it  can  be  paniiioncd  into  disjoint  subgraphs  by  the 
removal  of  F  edges.  The  subgraphs,  in  turn,  must  have  2^''‘-bifurcators  of  size 
F/2^^‘.  The  subgraphs  need  hot  be  identical  in  size,  but  after  2logF  levels  of 
recursion,  the  graph  must  be  completely  decomposed  into  isolated  nodes.  See 
[BL83,  L82]  for  more  information  on  bifurcaiors  and  their  applications.)  We  will 
use  the  fact  that  M ^  „  has  a  2^^‘-bifurcator  of  size  0(/?)  in  Section  2.3  where  we 
discuss  VLSI  layouts  for  the  mesh  of  trees. 

12  Relationship  to  the  Complete  Bipartite  Graph 

Just  as  the  shufRe-exchange  graph,  cube-connected-cycles  and  related 
networks  derive  their  computational  power  from  the  structure  of  the  hypercube, 
the  2*dimensional  mesh  of  trees  can  be  seen  to  derive  its  power  from  the  structure 
of  the  complete  bipartite  graph.  The  relationship  between  the  n-by-n  mesh  of 
trees  M2„  and  the  2/j’node  complete  bipartite  graph  can  best  be  explained 
by  illustration.  In  Figure  3a,  we  have  drawn  K44.  In  Figure  3b,  we  have  drawn 
^2,4  in  a  way  that  conforms  to  the  structure  of  K4  4.  The  nodes  and  edges  of 
M2J  are  drawn  according  to  the  same  conventions  followed  in  Figure  1. 


The  illustration  makes  clear  the  correspondence  between  rows  and  columns  of 
Ail„  and  nodes  (left-hand  side  and  right-hand  side,  respectively)  of  and 
between  grid  nodes  of  A/,„  and  edges  of  Given  this  correspondence,  it  is 
not  at  all  surprising  that  the  2-dimensional  mesh  of  trees  is  a  powerful  network 
for  parallel  computation. 

13  VLSI  Layouts 

Although  our  primary  emphasis  in  this  paper  is  on  the  structural  and 
computational  properties  of  the  mesh  of  tree*;,  it  is  worth  mentioning  the  results 
relevant  to  Thompson  grid  model  layouts  [T79,  TSO]  for  the  mesh  of  trees. 

A  quick  glance  at  Figure  1  suggests  a  natural  Oin^hg^nyarea  layout  for 
This  layout  also  has  Q(n^logrri)  wire  crossings,  and  edges  of  length  Q(nlogn).  For 
practical  purposes,  this  layout  is  the  simplest  known  and  might  well  be  optimal. 
Mathematically  speaking,  it  is  within  a  constant  factor  of  optimal  in  area  [L81, 
L831  but  is  suboptimal  in  other  respects.  For  example,  the  fact  that  A/?  „  has  a 
2^'^^-bifurcalor  of  size  0(n)  means  that  there  is  a  recursively  defined  layout  for 
that  has  area  Oin-Jog^n),  Oinrlogn)  wire  crossings  and  maximum  edge 
length  0(ntogn/loglogn)  [BL83,  L821.  In  [L81,  L83],  we  prove  that  these  bounds 
cannot  be  improved  by  more  than  a  constant  factor. 

in  addition  to  achieving  the  optimal  area,  crossing  number  and  edge  length 
bounds,  the  bifurcator-based  layouts  described  in  IBL83, 182]  have  a  number  of 
other  useful  properties.  For  example,  the  same  bounds  can  be  achieved  for 
networks  in  which  every'  node  is  replaced  by  an  0(/og/j)-by*0(/og«)-size 
processor.  In  fact,  the  area  required  for  a  layout  with  5-by-5-size  processors  is 
Q{n'{logn-hs)^.  The  layouts  described  in  [BL83,  L82]  are  also  fauli-iolerani  in 
the  sense  that  the  same  bounds  can  be  achieved  in  the  presence  of  faulty 
processors.  Lastly,  the  layouts  allow  space  for  variable-size  transistors  to  power 
signals  across  long  wires.  Without  this  feature,  the  assumption  that 
communication  across  a  long  wire  can  be  accomplished  in  unit  time  is  less 
realistic. 

Since  any  layout  for  A/i„  has  perimeter  Q(nhgn),  it  is  possible  (at  least 
mathematically)  to  connect  the  2n  roots  of  A/^  „  to  pins  on  the  exterior  of  the 
layout  without  affecting  any  of  the  results  mentioned  above.  This  fact  allows  us 
to  input  data  to  the  network  through  the  row  and  column  roots  at  a  rate  of  2n 
items  per  computational  step.  (For  large  values  of  n,  this  assumption  may  not  be 
realistic  given  the  current  fabrication  constraints  on  pincount  In  such  cases,  the 
data  must  be  entered  at  a  slower  rate  and  multiplexed  to  the  row  and  column 
roots.) 


3.  Sorting  Using  the  l-Dimcnsional  Mesh  of  Trees 

In  what  follows,  we  describe  three  algorithms  for  sorting  using  the  mesh  of 
trees  or  simple  variants  thereof.  All  three  algorithms  are  based  on  a  simple 
scheme  described  by  Muller  imd  Preparata  in  [MP75].  The  algorithms  essentially 
consist  of  comparing  every  number  to  every  od^er  number,  computing  ranks  (i.e., 
positions  in  the  soned  list),  and  then  permuting  the  numbers  according  to  rank. 
All  three  variants  sort  a  list  of  n  numbers  in  O(logn)  steps  but  each  uses 
pipelining  to  reduce  some  other  measure  of  complexity.  In  Section  3.1,  we 
describe  an  implementation  of  the  algorithm  on  the  mesh  of  trees  using 
processors  that  can  only  compute  a  constant  number  of  1-bit  operations  in  a 
single  time  step  (i.e.,  in  one  bit  step).  In  Section  3.2,  we  show  how  to  implement 
the  algorithm  to  sort  p  lists  of  numbers  in  Oilogn+p)  steps  (but  using  0{logn)- 
bit-size  processors),  thereby  increasing  the  data  rate  of  the  computation.  In 
Section  3.3,  we  show  how  to  implement  the  algorithm  on  a  simplified  mesh  of 
trees  in  order  to  decrease  the  area  necessary  to  sort  n  numbers  in  0(logn)  steps. 
The  latter  algorithm  is  also  reported  in  [NMB83]. 

3.1  Pipelining  to  Reduce  Processor  Size 

In  many  models  of  computation,  it  is  assumed  that  processors  can  perform  a 
constant  number  of  0(/og/i)*bit  word  operations  in  a  single  step.  In  what  follows, 
we  restrict  ourselves  to  consider  processors  that  can  perform  only  a  constant 
number  of  1-bii  operations  in  a  single  step.  In  particular,  we  will  show  how  to 
pipeline  A/,  ,,  in  order  to  sort  n  0(/og«)-bit  numbers  in  0{logn)  bit  steps.  For 
w-bit  numbers,  the  algorithm  requires  0(/ogo+m)  steps,  although  this  is 
suboptimal  /or  w  >  >  Q{logn). 

Let  Wj . H',,  denote  the  n  numbers  to  be  soned  and  let  r  be  the  rank 

function  for  the  list  (Formally,  /(/)  is  the  position  of  in  the  largest-first  soned 
ordering  of  the  list.)  Staning  at  the  roots,  input  w,  bit  by  b,it  (leading  order  bit 
first)  into  the  ith  row  and  column  trees  for  each  /,  \<i<n.  Pass  the  bits  down 
each  tree  so  that  after  logn  steps,  the  leading  bit  of  w,  has  reached  each  leaf  of  the 
iih  row  and  column  trees.  At  this  point,  the  (/,  j)  grid  node  sees  the  leading  bits 
of  Wf  and  h^.  If  they  differ,  the  grid  node  halts  and  stores  the  value  1  if  w,  >  Wj 
and  0  otherwise.  If  the  leading  bits  of  w,  and  wj  are  identical,  they  are  discarded 
and  the  (4  J)  grid  node  next  compares  the  2nd  leading  bits.  Comparison  of  w. 
and  continues  in  this  fashion  until  they  are  distinguished  or  until  they  are 
found  to  be  identical  on  all  the  bits.  If  equal,  then  the  grid  node  halts  and  stores 
the  value  1  if  /  >  j  and  0  otherwise. 

After  a  total  of  logn-^-m  steps,  every  grid  node  has  halted  and  stored  a  value 
indicating  whether  or  not  the  number  entered  into  its  row  tree  was  larger  than  the 


number  entered  into  its  column  tree.  For  example,  if  the  four  numbers  to  be 
sorted  on  M2J  were  (in  binary)  101,  001, 101  and  110  (in  that  order),  the  values 
stored  in  the  grid  nodes  would  be  as  shown  in  Figure  4. 
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Figure  4:  The  values  stored  in  the  grid  nodes  of  M2  4 
after  all  comparisons  are  completed 

Notice  that  tij)  can  be  found  by  summing  the  values  stored  in  the  jth  column 
tree.  (For  rea.sons  that  will  soon  become  apparent,  we  require  that  the  values  of 
/(/)  range  from  0  to  n-J  instead  of  from  1  to  n.)  This  is  because  the  ith  leaf  of  the 
y//if  column  tree  contains  a  1  precisely  when  or  when  and  i>J. 

These  values  can  in  fact  be  summed  in  2logn  steps  as  follows.  At  each  step,  each 
node  in  the  tree  stores  a  bit  and  transmits  a  bit  to  its  father  in  the  tree.  The 
transmitted  bit  (or  parity  bit)  is  the  least  significant  bit  of  the  sum  of  the  bit  stored 
in  the  last  step  and  the  two  bits  being  transmitted  from  its  sons.  The  stored  bit  (or 
carry  bit)  is  the  most  significant  bit  of  the  same  sum.  A  node  starts  transmitting 
bits  only  after  it  receives  transmitted  bits,  and  stops  once  it  has  transmitted  its 
carry  bit  and  its  sons  have  stopped  transmitting.  (The  single  exception  to  this  rule 
is  that  the  root  never  transmits  its  last  carrv’  bit,  which  is  necessarily  a  0.)  Initially, 
the  stored  bits  of  non-leaf  nodes  arc  0.  The  algorithm  commences  when  the 
leaves  transmit  their  stored  values. 

•  As  an  example,  we  have  shown  the  sequence  of  steps  taken  by  the  2nd 
column  tree  in  Figure  5,  Numbers  inside  the*  nodes  indicate  stored  bits. 
Numbers  on  the  edges  indicate  transmitted  bits.  Nodes  are  marked  with  X's  after 
their  last  transmission.  After  2logn  steps,  the  sum  is  output  bit  by  bit  (least 
significant  bit  first)  at  the  root  (where  it  may  be  stored  in  a  /og/j-length  stack  of 
nodes). 

After  J/ogn-fm  steps,  the  ith  column  root  contains  w,  and  /(/).  It  remains  only 
to  route  Wj  to  the  t{i)th  row  root  for  each  /  in  order  to  complete  the  sorting.  The 
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Figure  5:  Sequence  of  steps  used  :o  sum  the  values 
in  the  leaves  of  the  2nd  column  tree. 

routing  of  is  accomplished  by  first  paving  a  trail  from  the  root  of  the  ith 
column  tree  to  the  tii)th  leaf  of  that  tree  and  then  from  that  leaf  (which  is  also  in 
the  riOth  row  tree)  to  the  root  of  the  tiOth  row  tree. 

The  algorithm  to  accomplish  this  task  is  quite  simple.  In  the  first  step,  the 
root  of  the  ith  column  tree  observes  the  leading  order  bit  of  /</).  If  the  bit  is  0, 
then  the  node  passes  all  remaining  bits  of  /</)  and  w,-  to  its  left  son  (i.e.,  the  son 
that  is  closer  to  the  top  row).  If  the  bit  is  1,  then  the  node  passes  all  remaining 
bits  to  its  right  son.  In  either  case,  the  leading  bit  is  discarded  (i.e.,  it  is  not 
passed  on  to  either  son).  The  nodes  in  lower  levels  of  the  tree  act  the  same  way. 
They  observe  (and  discard)  the  first  bit  that  they  see.  If  it  is  0,  remaining  bits  are 
passed  to  tht;  left  son.  Otherwise,  the  remaining  bits  are  passed  to  the  right  son. 
It  is  not  difficult  to  check  that  the  r(f)th  leaf  (and  no  other  leaO  of  the  ith  column 
tree  starts  seeing  w,  after  21ogn  steps.  (It  should  now  be  clear  why  we  chose  to 
express  KO  in  the  range  from  0  to  n-/.)  The  leaf  immediately  passes  the  bits  of  w,- 
on  to  its  father  in  the  row  tree,  as  do  all  the  nodes  in  the  row  trees,  until  the  last 
bit  of  Wf  reaches  the  root  of  the  i{f)th  row  tree.  This  happens  after  logn+m 
additional  steps.  Since  we  have  insured  that  /(i)  for  i  *  y,  there  will  not 
be  any  conflicts  in  the  row  trees,  and  wj, will  appear  in  sorted  order  in 
the  row  roots  after  a  total  of  6logn-^2m  steps.  When  m-0(iogn),  the  running 
time  is  thus  0{logn)  bit  steps. 


3.2  Pipelining  to  Increase  Data  Rate 


If  the  processors  used  in  the  mesh  of  trees  are  allowed  to  perform  a  constant 
number  of  0{lognyh\t  word  operations  in  a  single  time  step,  then  the  sorting 
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algorithm  described  in  Section  3.1  can  be  greatly  simplified.  In  particular,  it  is  no 
longer  necessary  to  split  up  numbers  into  bits  for  pipelining.  Rather,  numbers 
from  successive  lists  to  be  sorted  can  be  pipelined  so  as  to  increase  the  number  of 
lists  that  can  be  sorted  in  O(logn)  steps.  We  describe  how  this  can  be  done  in 
what  follows. 

Let  p  denote  the  number  of  /r-element  lists  to  be  sorted.  At  the  first  time  step, 
enter  the  ith  number  of  the  first  list  into  the  roots  of  the  iih  row  and  column  trees 
for  all  L  At  the  second  time  step,  transmit  these  numbers  to  the  sons  of  the  roots 
and  enter  the  numbers  from  the  second  list  into  the  roots.  At  each  subsequent 
step,  continue  processing  the  previously  entered  lists  and  enter  the  data  from  the 
next  list  to  be  sorted  into  the  roots.  At  any  time  step,  processors  on  any  level  will 
be  handling  data  from  at  most  three  lists:  one  list  that  is  being  passed  to  the 
leaves  for  comparison,  one  list  for  which  the  ranks  are  being  computed  by 
summing,  and  one  list  for  which  the  final  routing  is  taking  place.  Once  a  list  is 
entered  into  the  network,  it  will  be  output  after  4logn  steps.  Hence  p  lists  can  be 
sorted  in  4hgn+p  steps.  This  is  significantly  faster  *an  the  0{plogn)  steps 
required  by  the  bit*based  algorithm  described  in  Section  3.1.  (Of  course,  the 
processors  are  larger  too.) 

3J  Pipelining  to  Reduce  Layout  Area 

For  some  applications,  decreasing  the  area  of  the  layout  is  more  important 
than  increasing  the  number  of  problems  that  can  be  solved  in  O(logn)  steps.  In 
such  cases,  it  is  useful  to  sort  using  a  simplified  mesh  of  trees.  The  n-byn 
simplified  mesh  of  trees  5^  ,,  is  constructed  from  by  first  removing  the 
internal  nodes  and  edges  from  all  but  every’  (hgn)th  row  and  column  tree,  and 
then  inserting  edges  so  that  where  there  was  once  a  copy  of  Af^/ogn 
there  is  now  a  logmby'hgn  mesh  with  a  tree  in  the  top  row  and  lefunost  column. 
For  example,  Sjy  is  shown  in  Figure  6.  (This  network  has  been  discovered 
independently  by  many  individuals  including  Nath,  Maheshwari  and  Bhatt  who 
call  it  the  orthogonal  tree  cycles  [NMB831.) 

It  is  not  difficult  to  construct  a  layout  for  the:  has  0(n^  area, 
crossings  and  edges  of  length  0(ff).  Interestingly,  the  simplified  mesh  of  trees  can 
sort  a  list  of  n  numbers  in  Oilogn)  steps.  (We  assume,  of  course,  that  the 
processors  are  capable  of  performing  0(/og/i)*bit  word  operations  in  a  single 
step.)  Given  a  list  of  numbers  to  be  sorted  wy, . . . ,  m-,,,  the  algorithm  proceeds 
as  follows. 

At  the  first  time  step,  is  input  to  the  roots  of  the  ith  row  and  column 
trees.  In  the  next  time  step,  these  values  are  passed  on  to  the  sons  of  the  roots 
and  is  input  to  the  roots  of  the  ith  row  and  column  trees.  This  process 

continues  for  logn  steps  whereupon  is  in  every  (logn  •  A)-level  node  of  the 


Figure  6:  The  4'by‘4  simplified  mesh  of  trees  . 

ith  row  and  column  trees.  During  the  next  logn  steps,  the  data  is  passed  to  the 
grid  nodes  through  the  mesh  edges.  After  a  total  of  2logn  steps,  the  j)  grid 
node  will  thus  contain  both  w,-  and  Wj.  All  pairwise  comparisons  take  place 
simultaneously  and  the  values  are  passed  through  the  meshes  to  the  trees  for 
summing.  After  logn  additional  steps,  the  rank  of  ^a-tpogn-f- 1  appears  at  the  root 
of  the  iih  column  tree.  The  routing  of  these  numbers  can  now  commence.  After 
another  logn  steps,  ever>'  rank  has  been  computed  and  the  numbers  arc  in  various 
levels  of  the  column  trees.  After  logn  more  steps,  all  the  numbers  have  reached 
the  appropriate  grid  nodes.  Suiting  simultaneously,  the  numbers  begin  moving 
in  the  other  direction,  fust  to  leaves  of  the  row  trees  and  then  on  to  the  roots  of 
the  tow  trees.  Eventually,  the  numbers,  will  appear  in  sorted  order  in  blocks  of 
logn  numbers  at  each  of  the  n/logn  row  roots.  The  toul  running  time  \i'8logn 
word  steps. 

It  is  worth  remarking  that  (like  the  bit-based,  algorithm  described  in  Section 
3.1)  the  sorting  algorithm  for  the  simplified  mesh  of  trees  cannot  be  further 
pipelined  to  sort  several  lists  simultaneously.  . 

4.  Other  Algorithms  for  the  Z-Dimensional  Mesh  of  Trees 

In  what  follows,  we  briefly  describe  a  variety  of  fast  algorithms  for  the 
2-dimensional  mesh  of  trees.  We  commence  with  an  0(/og/?)*step  algorithm  for 
matrix-vector  multiplication  in  Section  4.1.  This  algorithm  (which  originally 
appeared  in  [183,  NMB83])  has  applications  to  Fourier  transform  and 
convolution  as  well  as  to  a  variety  of  other  problems.  In  Section  42,  we  describe 
algorithms  for  the  transitive  closure  and  minimum  spanning  tree  problems.  We 
conclude  in  Section  4.3  with  an  algorithm  for  integer  multiplication  on  a  mesh-of- 
trees-like  structure  that  was  developed  by  Cappello  and  Steiglitz  {CS811. 


4.1  Matrix'Vector  Muitiplication 

Given  a  fixed  n-hy-n  matrix  5={s,y  1 1  <  4  y  ^  ,  we  will  show  how  to  use 

A/2  „  to  compute  the  product  of  5  and  any  input  n-veaor  in  2Iosn  steps.  As  S  is 
fixed,  it  is  not  considered  to  be  part  of  the  on-line  input.  Rather,  it  is  considered 
to  be  part  of  the  program  (in  the  form  of  off-line  input)  and  we  assume  that  the 
value  of  Sjj  is  initially  stored  in  the  (4  J)  grid  node  for  all  /  and  J.  The  algorithm 
proceeds  as  follows. 

Given  any  input  vector  v  =  {vjll  <  n] ,  input  into  the  root  of  the  jih 
column  tree  for  each  J  at  the  first  time  step.  Pass  the  entries  of  v  down  the 
column  trees  so  that  after  logn  steps,  each  leaf  in  the  jih  column  tree  has  received 
the  value  of  vj.  Computation  of  the  products  {s^yv^  1 1  <  4  y  <  n}  can  now 
take  place  simultaneously.  Afterwards,  we  can  find  the  values  of  the  produa 
vector  Sv  by  summing  the  values  of  the  leaves  in  each  row  tree.  This  summing 
operation  takes  an  additional  /og/i  steps.  Thus  after  a  total  of  2/og/i  steps,  the 
values  of  the  product  are  output  at  the  roots  of  the  row  trees. 

The  form  of  the  algorithm  for  matrix-vector  multiplication  is  very  similar  to 
that  for  sorung.  Hence,  it  should  not  be  surprising  that  the  algorithm  just 
described  can  be  pipelined  in  the  three  ways  described  in  Section  3.  For 
example,  the  product  of  5  with  p  n-vectors  can  be  computed  in  2logn-^p  steps 
and  the  simplified  mesh  of  trees  can  be  used  to  compute  a  single  produa  in  4logn 
steps  (thereby  reducing  the  area).  Reducing  the  size  of  the  processors  is  slightly 
more  complicated  since  multiplication  is  harder  than  comparison.  The  difficulty 
can  be  overcome  by  expanding  each  grid  node  into  a  small  /og/i-bit  multiplier. 

The  matrix-veaor  produa  algorithm  has  a  number  of  useful  applications. 
Most  importantly,  it  can  be  used  to  compute  discrete  Fourier  transforms  by 
setting  5  to  be  the  well-known  discrete  Fourier  transform  matrix.  As  a  result,  the 
mesh  of  trees  can  compute  convoluuons,  interpolations  and  polynomial  products 
as  well  as  a  variety  of  other  tasks  in  Oihgn)  steps. 

4.2  Graph  Problems 

In  [NMB83],  Nath,  Maheshwari  and  Bhatu  describe  an  0(/og^ff)-step 
algorithm  for  the  connected  components  problem.  They  also  show  how  to 
modify  the  algorithm  to  obtain  0(/og^/7)-step  algorithms  for  the  transitive  closure 
and  minimum  spanning  tree  problems.  The  algorithms  are  based  on  the 
Hirschberg-Chandra-Sarwate  [HCS79]  0(/og^n)-time  algorithm  for  connected 
components,  which  uses  a  shared-memory  model  of  parallel  computation.  In 
what  follows,  we  show  how  to  modify  the  [HCS79]  algorithm  so  that  it  can  be 
executed  on  an  n-by-n  mesh  of  trees  in  0{log^n)  steps.  (A  similar  modification  is 
described  in  [HV831.)  By  following  the  techniques  described  in  [N82,  NMB83], 


the  algorithm  can  then  be  modified  to  obtain  0(/o^/i)-step  algorithms  for  the 
transitive  closure  and  minimum  spanning  tree  problems. 

Given  and  undirected  /i-node  graph  G,  let  A  =  {a,y  I  1  <  4  7  <  «}  be  the 

adjacency  matrix  for  G  and  let  D  be  an  array  such  tJiat  £)(/)  is  the  smallest 

number  of  a  node  in  the  connected  component  of  G  containing  node  4  Assume 
that  a  11=1  for  all  4  The  [HCS79]  algorithm  for  computing  D  proceeds  as  follov/s. 

opl:  set  2X0  =  i  for  all  / 

op2:  do  op2.1  through  op2.4  for  logn  iterations 

opll:  set  £(/)  =  min  {D{j)  \  Oy=l}  for  all  i 

op2.2:  set  C(/)  =  min  {£(/)  |  D(j)=i}  for  all  / 

j 

op2.3:  do  op2.3.1  for  logn  iterations 
op2.3.1;  set  C(/)  =  C(C(/))  for  all  / 
op2.4:  set  D(i)  =  C(Z)(/))  for  all  / 
op3:  end 

It  is  easily  shown  (see  [NMB83I,  for  example)  that  opll,  op2.2,  op2.3.1  and 
op2.4  can  be  implemented  in  0(/ogn)  steps  on  an  n-byn  mesh  of  trees.  The 
problem  is  that  op2.3.1  is  executed  /og-’n  times  by  the  preceding  algorithm.  This 
problem  can  be  overcome  by  implementing  the  following  algorithm  on  the  mesh 
of  trees. 

opl:  set  2X0  =  C(/)  =  i  and  S(i)  =  "active"  for  all  / 
op2:  do  op2.1  through  op2.6  for  /0/ogn  iterations 

op2.1:  set  £(/)  =  min  {D(/)  I  1  and  S(2X/))= "active"}  for  all  / 

s  J 

op2.2;  if  5(C(/))  =  "active",  then  set  C(0  =  mjn  {£(/)  |  D(J)=i}  for 
aU  /  ^ 

op2.3:  if  C(/)  ^  i,  then  set  S(C{i))  =  S(/)  =  "inactive"  for  all  / 

op2.4:  set  C(/)  =  C(C(/))  for  all  i 

op2.5:  if  C(r)  =  r  and  no  new  values  of  C(/)  were  set  to  r  in  op2.4, 
then  set  5(r)  =  "active"  for  all  r 

op2.6:  if  S(C(2X0))  =  "active",  then  set  2X0  =  C(2X0)  for  all  i 


The  key  difference  in  the  two  algorithms  is  that  the  latter  algorithm  processes 
the  necessary  number  of  iterations  of  op2.3.1  in  parallel  with  the  rest  of  the 
operations.  The  S(j)  variables  are  included  to  insure  that  the  functioning  of 
op2.3.1  (op2.4  in  the  latter  algorithm)  doesn't  conflict  with  the  other  operations. 
(In  particular,  S(j)  is  "inactive"  precisely  when  the  label  j  is  being  processed  by 
op2.4  and  thus  when  it  is  not  available  for  processing  by  the  other  operations.) 
The  proof  that  this  algorithm  performs  as  claimed  is  somewhat  delicate  and  is 
described  in  (HV83]  so  we  have  not  included  it  here.  The  proof  that  the 
algorithm  can  be  implemented  on  A/ji,,  using  0{log^n)  steps  is  not  difTicult 
(especially  given  [NMB831)  and  we  leave  it  as  an  exercise  for  the  reader. 

As  was  the  case  with  soning  and  matrix-vector  multiplication,  these  algorithms 
can  be  pipelined  in  the  ways  described  in  Section  3. 

4J  Integer  Multiplication 

In  [CS81],  Cappello  and  Steiglitz  show  how  to  use  a  variant  of  the  mesh  of 
trees  structure  to  multiply  two  n-bii  numbers  in  0{logn)  bit  steps.  We  briefly 
summarize  this  result  in  what  follows. 

The  network  for  integer  multiplication  is  constructed  from  an  n-by-2/i  grid  of 
nodes.  Nodes  and  edges  are  added  to  form  complete  binary  trees  in  the  rows, 
columns  and  transverse  diagonals  of  the  grid.  As  before,  the  leaves  of  the  trees 
should  coincide  with  the  nodes  of  the  grid.  As  an  example,  we  have  indicated  the 
location  of  the  diagonal  trees  for  /i=4  in  Figure  7. 


Figure  7:  Location  of diagonal  trees  in  a  4eby8  grid  of  nodes. 


Let  aj-  •  -a^  and  by  ’b„  be  the  binary  representations  of  the  two  numbers 
to  be  multiplied.  At  the  first  step,  input  a,  to  the  root  of  the  (n-i+l)st  row  tree 
for  each  /  and  input  b,-  to  the  root  of  the  itb  diagonal  tree  for  each  i.  Pass  the 
values  from  the  roots  to  the  leaves  of  the  trees  so  that  after  logn  steps,  the  (/i-/+ 1, 
^t’l-l*fy)  grid  node  contains  a^  and  l^.  TTiese  values  are  then  simultaneously 
multiplied  and  stored.  For  example.  Figure  8  displays  the  stored  values  for  the 
product  of  1101  and  1011. 
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Figure  8:  Stored  values  for  the  product  of 1101  and  1011. 


It  is  obvious  from  looking  at  Figure  8  how  the  algorithm  multiplies  two 
numbers.  The  algorithm  is  simply  simulating  the  elementary  school  method  of 
digital  multiplication.  At  this  point  it  remains  only  to  sum  the  n  2/i-bit  numbers 
contained  in  the  rows.  This  is  done  in  hg*n  stages  as  described  below. 

In  the  first  stage,  the  bits  in  each  column  are  summed  as  described  in  Section 
3.1  and  then  stored  in  a  /og/i-length  stack  at  the  row  root  This  stage  takes  2logn 
steps  and  reduces  the  problem  to  that  of  adding  hgn  2/j-bit  numbers.  The  new 
problem  can  be  handled  in  a  recursive  fashion.  For  example,  in  the  second  stage, 
the  algorithm  takes  2loglogn  steps  and  leaves  loghgn  2n*bit  numbers  to  be 
summed.  After  log*n  stages  (0(/ogn)  steps),  we  are  left  with  2  2rt*bit  numbers  to 
be  summed.  This  final  sum  is  computed  with  a  Brent* Kung  adder  (BK801  in 
Oilogn)  steps.  (Note  that  the  Brent*Kung  network  must  be  interconneaed  to  the 
roots  of  the  mesh  of  trees  to  form  the  multiplication  network.  By  further 
modifying  the  network,  the  algorithm  can  be  pipelined  in  the  ways  described  in 
Section  3.) 


5.  Multidiniensionai  Meshes  of  Trees 


5.1  Definitions  and  Properties 


The  2-dimensional  mesh  of  trees  can  be  easily  generalized  to  higher 
dimensions.  For  example,  the  3-dimensional  n-by^n-byn  mesh  of  trees  M j  „  can 
be  constructed  as  follows.  Starting  with  an  n-by-n-by-n  cube  of  nodes  and  adding 
nodes  where  indicated,  construct  a  set  of  complete  binary  trees  in  each  of  the 
three  dimensions  of  the  cube.  As  before  the  tree  should  be  constructed  so  that 
the  leaves  are  precisely  the  nodes  of  the  original  cube  and  so  that  the  subgraph 
induced  on  each  octant  of  nodes  is  example,  we  have  drawn  itt 

Figure  9.  The  nodes  in  the  original  cube  of  nodes  appear  as  dots  while  the 


internal  nodes  appear  as  squares. 
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Figure  9:  The  2‘by-2'by’2  mesh  of  trees 


The  general  i^dimensional  mesh  of  trees  „  is  formed  from  an  n-by^*  •  ^h 
hypercube  in  a  similar  manner.  In  general,  removal  of  the  roots  and  edges  that 
are  in  the  top  level  of  the  binary  trees  will  leave  2'  disjoint  copies  of  Mr.n/2'  ** 

easily  shown  that  has  IV  =  ©(mO  nodes,  inaximum  degree  r,  diameter 

2rlogn  =  Q(logN),  and  a  2^''^-bifurcator  of  size  =  ©(iV^'^'^O-  Optimal 
layouts  for  are  discussed  in  [LSI,  L83]. 

5.2  Matrix  Multiplication  Using  the  ^Dimensional  Mesh  of  Trees 

In  what  follows,  we  describe  a  2/ogn-step  algorithm  for  multiplying  two  n-by-n 
matrices  using  A/j  „.  This  algorithm  was  originally  discovered  by  Preparata  and 
Vuillemin  [P\'80J,  although  llie  underlying  network  was  not  discovered  to  be  the 
mesh  of  trees  until  recently.  Tlie  algorithm  can  be  pipelined  in  several  ways  (see 
[PV80]),  including  those  ways  described  in  Section  3.  In  what  follows,  we  review 
the  simplest  version  of  the  algorithm. 

At  the  first  time  step,  the  two  matrices  to  be  multiplied  are  entered  into  the 
network  via  the  roots  of  the  trees  in  two  of  the  dimensions  (one  dimension  for 
each  matrix).  The  entries  are  passed  down  through  the  trees  so  that  after  logn 
steps,  the  (4  j,  k)  grid  node  contains  the  (4  j)  entry  of  the  first  martix  and  the 
(/,  k)  entry  of  the  second  matrix.  All  multiplications  are  then  performed 
simultaneously.  The  entries  of  the  produtt  n  *ix  are  then  calculated  by 
summing  the  values  of  the  leaves  of  each  tree  in  the  third  (previously  unused) 
dimension.  The  summing  process  takes  an  additional  logn  steps.  The  total 
computation  takes  2logn  steps. 

5.3  The  Shuflle'Tree  Graph 

The  r-dimensional  mesh  of  trees  was  defined  as  a  natural  extension  of  the 
2-dimensional  mesh  of  trees.  „  can  also  be  viewed  as  a  generalization  of  the 
r-cube,  another  powerful  communications  network.  For  example, 
cube  with  every  edge  replaced  by  a  path  of  length  two.  (Glance  at  Figure  9  once 
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again.)  Viewed  in  this  light,  the  /^dimensional  mesh  of  trees  motivates  the 
definition  of  a  shujfle-lree  graph  in  much  the  same  way  that  the  r-cube  motivates 
the  definition  of  the  2''-node  shuffle-exchange  graph.  In  what  follows,  we  review 
the  transformation  of  the  hypercube  into  a  shuffle-exchange  graph  and  show  how 
the  same  transformation  can  be  applied  to  to  form  a  shuffle-tree  graph. 

Because  the  r-cube  has  nodes  of  degree  r,  it  is  sometimes  not  appropriate  for 
practical  applications.  In  such  cases,  the  shuffle-exchange  graph  is  often  used 
instead,  liie  2'’-node  shuffle-exchange  graph  is  formed  from  the  (2''-node)  r-cube 
by  removing  all  edges  except  those  that  link  nodes  differing  in  the  last  bit,  and 
then  inserting  edges  between  nodes  that  are  left  or  right  cyclic  1-shifts  of  one 
another.  The  edges  from  the  original  r-cube  are  called  exchange  edges  and  the 
edges  inserted  between  nodes  that  are  1-shifts  of  one  another  are  called  shuJJIe 
edges  (owing  to  their  ability  to  shuffle  a  deck  of  data  in  a  single  step  [DGK81, 
LLM83]).  Mathematically  speaking,  the  shuffle  edges  are  fonned  by  rotating  the 
nodes  of  the  r-cube  in  r  dimensions  about  the  line  between  the  all-0  node  and  the 
alI-1  node.  The  rotation  permutes  the  nodes  in  r-cycles  which  correspond  to 
^cycles  of  the  shuffle  edges.  (When  r  is  composite,  degenrate  cycles  with  fewer 
than  r  edges  may  appear.)  As  an  example,  we  have  included  Figure  10.  (Those 
readers  who  would  like  to  know  more  about  the  properties  of  shuffle-exchange 
graphs  might  find  (LLM831  of  interest) 
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Figure  10:  a)  the  3-cube  (dashed  edges  are  to  be  removed);  b)  the  8-node 
shuffle-exchange  graph  (dashed  edges  are  shuffle  edges). 


If  the  edges  of  the  hypercube  are  used  in  only  bne  dimension  at  a  time  by  an 
algorithm  and  if  the  dimensions  are  used  in  the  natural  cyclic  order  (as  they  are 
for  almost  all  hypercube  algorithms),  then  the  same  algorithms  can  be 
implemented  on  the  shuffle-exchange  graph  by  passing  the  data  along  shuffle 
edges  between  each  hypercube  operation.  Since  passing  the  data  along  shuffle 
edges  corresponds  to  a  rotation  of  the  underlying  hypercube  structure  of  the 
graph,  the  exchange  edges  of  the  shuffle-exchange  graph  effectively  simulate  the 
cunently  active  dimension  of  edges  in  the  hypercube.  As  a  result,  the  time 
necessary  to  run  the  algorithm  on  the  shuffle-exchange  graph  is  at  most  double 


the  time  required  by  the  hypercube.  The  advantage  of  the  shuffle-exchange 
graph  is  that  it  has  node  degree  at  most  three. 

The  (r,  nyshuffle-iree  graph  is  constructed  from  „  by  first  removing  all 
but  one  dimension  of  the  trees,  and  then  by  adding  edges  that  correspond  to  a 
rotation  of  „  in  i^space  about  the  line  between  nodes  (1 . 1)  and 

(n, . . .  ,n).  Two  nodes  (r/y, . . .  and  (vy . v^)  of  the  original  n''-node 

cube  are  leaves  of  the  same  tree  in  if  Uy=  v,  for  all  /  <  a,  and  they  are  linked 

by  a  shuffle  edge  if  one  is  a  cyclic  l-shift  of  the  other.  For  example,  the  tree  and 
shuffle  edges  of  are  shown  in  Figures  11a  and  11b,  respectively.  Notice  that 
the  shuffle  edges  correspond  to  a  simple  transposition  of  the  nodes.  That  is 
because  a  transposition  is  simply  a  rotation  in  2-space  about  the  line  between 
(1.1)  and  (n.n). 


Figure  11:  a)  the  tree  edges  ofT2^^;  b)  the  shuffle  edges  ofT2^^, 

It  is  worth  noting  that  all  of  the  algorithms  described  in  this  paper  for 
can  be  run  in  twice  the  number  of  steps  on  This  is  due  to  the  fact  that  aU 
the  alprithms  perform  operations  along  one  dimcr»_’  jn  of  trees  at  a  time.  When 
that  dimension  changes,  the  data  is  simply  passed  along  the  shuffle  edges  in 
As  with  the  shuffle-exchange  graph,  however,  has  node  degree  at  most  three. 

As  we  have  just  seen,  the  class  of  (r,  /j)-shuffle-tree  graphs  is  very  powerful. 
At  one  end  of  the  spectrum  (/t=2),  the  class  includes  the  class  of  shuffle- 
exchange  graphs,  for  which  many  fest  algorithms  are  known  {S71,  580).  At  the 
other  end  of  the  spcarum  (r=2,3),  the  class  includes  the  2-  and  3-dimensional 
meshes  of  trees,  for  which  many  good  algorithms  are  also  now  known.  Whether 
or  not  the  graphs  in  the  center  of  the  spectrum  are  useful  (we  suspect  that  they 
are)  is  an  interesting  open  question. 
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SUicon-On-Insulator  Bipolar  Transistors 

M.  RODDER  AND  D.  A.  ANTONIADIS,  membek,  ieee 


Atoracf— Thla-fllm  lateral  a-p-n  bipolar  traailaton  (BJT)  have 
been  fabricated  In  moring  melt  aone  recrystalUacd  silicon  on  a  0.^>i<m 
silicon  dioxide  substrate  thermally  grown  on  bulk  silicon.  Current* 
voltage  characteristics  of  devices  with  different  base  widths  (5  and 
10  pm)  have  been  analysed.  The  use  of  a  metal  gate  over  oxMe 
covering  the  base  region  has  allowed  the  devices  to  be  operated  as 
n*channel  MOSFET’s  as  well  thus  surface  effects  on  device  character¬ 
istics  have  been  investigated  under  varying  gate-bias  voltages.  Maxi¬ 
mum  dc  current  gain  values  of  2.5  were  achieved  with  a  5-pm  base 
width  and  values  around  0.5  with  a  lO-pm  base  width.  Higher  gain 
values  were  impeded  by  onset  of  high-level  injection  which  occurred  at 
low  currents  because  of  light  base  doping  of  these  devices. 

DATE  silicon-on-insxilator  (SOI)  technology  has  found 
application  only  in  MOSFET  device  fabrication.  This  is 
because  of  the  simplicity  of  these  devices  and  also  because  the 
extremely  low-minority  lifetimes  in  the  dominant  SOI  films, 
namely  silicon  on  sapphire  (SOS),  have  made  bipolar  devices 
impractical.  However,  recent  SOI  technology  based  on  moving 
melt-zone  recrystallization  has  yielded  films  with  significantly 
improved  lifetimes  [1] .  SOI  bipolar  devices  may  be  desirable, 
either  by  themselves  or  in  combination  with  MOSFET’s  in  the 
same  circuit,  because  they  exhibit  higher  traruductance  for  a 
given  area  and  bias  current  than  MOSFET’s.  This  paper  reports 
the  Hrst  realization  of  silicon-ondilicon-dioxide  bipolar  devices. 

The  samples  were  prepared  using  the  nonseeded  moving 
melt-zone  recrystallization  technique  which  has  been  reported 
elseudiere  [2] .  The  insulator  material  undertying  the  O-S-pm  Si 
film  was  OJ  pm  of  thermally  grown  SiOg  on  (111)  oriented 
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work  wu  supported  by  OARPA  under  Grant  N0014d:-80-0622. 
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n-type  doped  silicon  wafers.  After  recrystallization,  the  en¬ 
capsulation  layer  was  removed  and  a  500-A  buffer  oxide  layer 
was  grown  at  1(XX)°C  in  SO  min.  Next,  a  double  boron  im¬ 
plantation  was  performed  over  the  entire  sample  with  doses 
of  X  10* *  cm-*  at  20  keV  and  3.5  X  10**  cm**  at 
200  keV;  this  implant  determined  the  base  doping  profile. 
Following  the  implant,  the  base  regions  were  covered  with  a 
polyimide  layer  defined  using  a  Oj  plasma  etching  process.  A 
phosphorous  implantation  to  form  the  emitter  and  collector 
regions  was  performed  with  a  dose  of  3.5  X  10**  cm”*  at 
250  keV.  The  polyimide  layer  was  then  removed  and  an  A1 
layer  was  patterned  to  be  used  at  a  mask  during  the  sub¬ 
sequent  SF(  plasma  etch  of  the  recrystallized  Si,  so  as  to 
form  individual  Si  islands.  The  500-A  oxide  was  then  re¬ 
moved  and  a  new  1 500-A  gate  oxide  was  thermally  grown 
at  900“C  using  a  dry-wet -dry  oxidation.  Contact  holes  were 
then  defined  and,  subsequently,  A1  was  deposited  and  pat¬ 
terned  to  define  the  contact  pads  and  gate  electrodes.  The 
samples  were  finally  annealed  in  forming  gas  at  4S0'’C  for  30 
min.  A  cross  section  and  top  view  of  the  merged  lateral  BJT- 
MOSFET  device  is  shown  in  Fig.  1 . 

The  base  (channel)  dimension  perpendicular  to  current 
flow  was  nominally  100  jun.  Nominal  base  widths  (channel 
lengths)  were  5  and  10  jim.  After  measuring  the  actual  base 
widths  on  the  sample  and  accounting  for  lateral  diffusion, 
the  widths  are  estimated  to  be  approximately  3.5  and  8i  Mm. 

Measurements  of  base  and  collector  currents  (/^  and  Ic) 
versus  base-emitter  voltage  Vbe  and  as  a  function  of  top 
gate-to-base  voltage  Vqb  were  obtained  using  a  two-channel 
Keithley  619  Electrometer  incorporated  into  an  automated 
data  acquisition  system.  Values  of  oxide  thickness  and  of  the 
MOSFET  threshold  voltage  were  also  obtained  using  the  above 
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Fig.  1.  Cton  section  end  top  view  of  the  iateial  bipobi-merged  MOS- 
FET  device. 


system  and  a  capacitance  meter;  typical  values  are  ^OX  “ 
1500  A,  and  Kj-  »  4  V  at  aero  source-bulk  (emitter-base) 
bias.  Current-voltage  characteristics  are  shown  in  Fig.  2  for 
typical  S-  and  lO^iun  base-width  devices.  All  measurements 
shown  were  performed  with  collector-base  bias  voltage  VcB  = 
0  V  to  avoid  any  base-width  modulating  effects,  and  back- 
gate  (Si  substrate)  to  base  bias  voltage  equal  to  ~10  V.  A  small 
number  of  devices  were  found  to  exhibit  poorer  characteristics 
than  those  shown  in  Fig.  2.  Given  the  fact  that  the  silicon  re¬ 
crystallization  was  unseeded,  we  attribute  these  poor  char¬ 
acteristics  to  inevitable  large-angle  grain  boundaries  throu^ 
those  devices  [3].  No  significant  performance  difference  be¬ 
tween  the  majority  of  devices  oriented  parallel  or  perpendicular 
to  the  predominant  sub-boundary  direction  was  observed. 

Two  characteristics  of  the  collector  current  can  be  seen 
from  Fig.  2(a):  (a)  for  sufficiently  negative  values  of  Vqb 
(-3  V),  Ic  is  independent  of  variations  of  Fcj;  (b)  as 
increases,  Ic  shows  a  strong  dependence  on  the  gate-to-body 
bias  voltage.  Since  we  are  primarily  interested  in  the  bipolar 
characteristics  of  the  thin-film  device,  it  is  important  to 
understand  which  of  the  Ic  curves  are  strictly  due  to  lateral 
bipolar  action. 

The  distinguishing  characteristics  of  a  true  (lateral)  bipolar 
device  are  two-fold.  First,  if  the  carrier  quasi-Fermi  levels  are 
assumed  constant  through  the  base-emitter  junction  depletion 
region,  then  the  collected  bipolar  current  exhibits  a  char¬ 
acteristic  of  the  form  Ic**  exp  .  where  n  =  1  for 

low,  and  n  »  2  for  high4evel  injection  conditions.  Hence,  on  a 
semi-log  plot  of  Ic  versus  Vbe>  low4evel  injection  the 
slope  should  correspond  to  the  well-known  60-mV/decade  at 
room  temperature.  &cond,  the  collected  bipolar  current  is  due 
to  the  flow  of  injected  minority  carriers  through  a  base  region 
which  satisfies  the  quasi-neutrality  condition-i4.,  the  net 
charge  density  in  the  base  p»  satisfles  the  condition  pg  <<lNx 
where  is  the  bue-doping  concentration.  Therefore,  if  a 
component  of  collected  current  is  due  to  injection  of  minority 
carriers  into  a  region  which  does  not  utisfy  the  quasi-neutrality 
condition,  then  this  component  is  not  due  to  true  bipolar 
transistor  action  but  rather  to  field-effect  action.  Such  a  cur- 


Fig.  2.  (a)  Typical /-F  characteristic  for  a  short-channel  length  device, 
(b)  Typical  /-  V  characteristic  for  a  long-channel  length  device.  All 
measurements  at  y^g  >  0  V  and  -10  V  bias  of  back  gate  with  re¬ 
spect  to  the  emitter  (source). 

rent  component  might  be  due  to  injection  into  a  depleted 
region  at  the  top  or  bottom  Si-SiO}  interface. 

Thus  to  assure  that  the  measured  Ic  is  strictly  due  to 
bipolar  action,  it  is  important  that  the  base  near  both  the  top 
and  bottom  Si-SiOg  interface  is  biased  into  “flat-band”  con¬ 
dition.  If  these  regions  are  depleted  then  what  we  are  observing 
is  actually  a  surface  MOSFET  operating  in  weak  inversion  in 
parallel  with  the  bipolar  transistor.  It  is  interesting  to  note  that 
under  these  conditions,  simple  theory  [4]  predicts  the  slope 
of  the  drain  current  with  respect  to  Vgg  (this  would  be  Vgs 
in  MOSFET  convention),  is  also  60  mV/decade  as  for  the  bi¬ 
polar  and  thus  as  can  be  seen  from  Fig.  2(a),  the  slope  cannot 
be  used  to  distinguish  between  MOSFET  and  BJT  operation. 
Since  for  electrons  emitted  into  a  depleted  region  there  is  very 
little  recombination  the  apparent  dc  current  gain  0  under  this 
condition  can  be  very  high.  Thus  of  the  Ic  curves  in  Fig.  2(a), 
the  ones  that  correspond  to  true  bipolar  action  are  the  ones 
for  which  Vcg  assures  neutral  or  accumulated  surface  condi¬ 
tion.  In  our  case  this  occurs  around  V^g  -3  V.  Below  this 
value,  the  surface  starts  accumulating  and  the  effect  of  the 
VcB  >s  screened  out  of  the  bulk  base.  Similarly  the  -10-V 
bias  of  the  back  gate  with  respect  to  the  base  assures  that  the 
back  interface  is  accumulated,  and,  therefore,  it  does  not 
preMnt  a  favorable  current  path. 

It  is  important  to  stress  at  this  point  that  any  other  way  of 
biasing  our  device  would  give  erroneous  results.  For  example, 
if  the  top  gate  was  left  floating  or  equivalently  it  was  not 
there,  as  in  most  bulk  lateral  BJT,  then  flxed  surface  charges 
could  deplete  the  surface  and  allow  favorable  surface  current 
emission.  Similarly,  if  the  gate  was  biased  with  respect  to  the 
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source  rather  than  with  respect  to  the  bulk,  a  Ka^-dependent 
depletion  modulation  could  occur  with  concommitant  effects 
on  the  relationship  between  Ic  and  Vbb- 

Although  the  collected  current  shows  a  strong  dependence 
on  Vgb>  shows  only  a  slight  dependence  and  only  at  low 
^BE  (Fig.  2(a)).  This  dependence  is  due  to  the  fact  that  if  the 
base  region  under  the  gate  is  depleted  due  to  an  applied  bias 
VcB>  the  base  current  will  have  a  component  arising  from  re¬ 
combination  in  this  depleted  region.  Since  the  magnitude  of 
the  recombination  current  is  proportional  to  the  depth  of  the 
depletion  region,  it  is  obvious  that  Ig  will  be  a  function  of 
Vcg  since  the  depth  of  the  depleted  region  itself  depends  on 
Fcfl.  However,  it  is  interesting  to  note  that  this  surface  effect 
is  quite  small  indicating  a  low  density  of  surface  recombina¬ 
tion  centers  at  the  top  Si-SiO}  interface. 

Both  the  collector-current  and  the  base-current  character¬ 
istics  exhibit  considerable  change  in  their  respective  slope  at 
increased  Fgf.  Considering  at  first  only  the  currents  due  to 
true  lateral  bipolar  action,  there  are  two  effects  both  of 
which  cause  degradation  of  the  current  characteristics.  First, 
the  collector-current  slope  changes  from  60  to  120  mV/ 
decade  due  to  the  onset  of  high4evel  injection  in  the  vicinity 
of  =  0.7  V.  From  classical  theory,  for  a  uniformly  doped 
base  the  onset  condition  is  defined  as 

kT 

Ka£  =  2  — In—  (1) 

q  n, 

The  value  of  =  I  X  10^  *  cm” ®  from  the  above  definition 
corresponds  reasonably  well  with  the  average  base  doping  ob¬ 
tained  from  Vx  versus  ^BE  "bservations  and  from  SUPREM 
simulation;  the  fact  that  the  base  doping  is  not  unifonn  per¬ 
pendicularly  to  cunent  flow  does  not  significantly  alter  the 
validity  of  this  discussion.  On  the  other  hand,  the  base-current 
slope  degradation  is  not  due  to  high-level  injection;  rather,  a 
high  series  resistance  of  about  10  kfl,  from  the  edge  of  the 
base  to  the  contact  causes  the  actual  base>emitter  junction 
voltage  to  be  less  than  the  externally  applied  value  of  Vgg, 
for  VgE  applied  greater  than  approximately  0.7S  V.  Henc:, 
for  Vee  >  0.7S  V,  Ic  also  degrades  because  of  this  base  series 
resistance  effect. 

Now,  consider  the  current  flow  attributable  to  the  merged 
MOSFET  device  associated  with  the  top  Si-SiOj  interface. 
From  Fig.  2(a),  it  can  be  seen  that  the  degradation  of  current 
slope  of  the  merged  MOSFET  begins  not  at  a  fixed  VgE  but  at 
a  fixed  value  of  collected  current-interestingly,  at  approxi¬ 
mately  the  value  of  Ic  at  which  the  true  bipolar  device  enters 


the  high-level  injection  region.  It  can  be  shown  easily  that  this 
change  of  slope  is  due  to  onset  of  moderate  inversion  (ix.,  end 
of  the  weak  inversion)  of  the  merged  MOSFET  device  (5  ] ; 
hence,  the  onset  of  moderate  inversion  for  the  MOSFET 
occurs  at  the  same  value  of  as  the  onset  of  high4evel  in¬ 
jection  for  the  BJT.  This  interesting  observation  can  be  easily 
shown  to  be  predicatable  from  straightforward  device  theory 
[4). 

It  is  now  important  to  analyze  the  differences  inl-V  char¬ 
acteristics  due  to  varying  base  widths.  As  can  be  seen  by  com¬ 
paring  Fig.  2(a)  to  Fig.  2(b),  the  values  of  Ig  for  a  S-pm  base- 
width  device  is  lower  than  the  corresponding  value  for  a 
10-fim  base-width  device  at  the  same  VgE-  This  is  obviously 
due  to  less  recombination  in  the  narrower  base.  Accordingly, 
the  dc  common  emitter  current  gain  (fi),  increases  from  a 
value  of  0.5  for  the  lO-um  device  to  above  2.5  for  the  5-#rm 
device.  This  indicates  that  the  base  current,  for  the  most  part, 
is  due  to  bulk  minority  recombination.  Using  standard  BJT 
theory  and  the  two  values  of  base  width,  a  characteristic  dif¬ 
fusion  length  (L„)  is  determined  to  be  approximately  5  fim. 
Assuming  a  value  of  =  10  cm^/s  for  the  electron  diffusion 
coefficient,  we  obtain  an  approximate  value  of  r„  »  3  X  10“*  s 
for  the  minority -carrier  lifetime  (electrons). 

It  is  not  clear  at  this  point  whether  the  observed  low  life¬ 
time  is  an  inherent  property  of  our  thin  films,  or  whether  it  is 
due  to  subsequent  processing.  Lifetimes  in  the  microsecond 
range  in  SOI  have  been  reported  previously  (1  ] ,  although  they 
were  measured  using  a  MOS  deep-depletion  capacitance  re¬ 
covery  technique.  Thus  it  appears  that  higher /Ts  can  be  achieved 
in  the  future  using  these  SOI  films.  Nevertheless,  modifications 
in  the  device  design  to  allow  scaling  of  the  base  width  to  less 
than  1  pm,  while  maintaining  a  low  base  series  resistance,  can 
yield  devices  with  0's  of  about  20  even  with  the  present  low 
lifetimes. 
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Ahtttact 

Thif  p^ier  presenti  the  approach  of  MlT'i  Placement 
latereonnect  (PI)  Project  to  routing  aonctouing  VDD  and  GND 
trees  in  single-layer  metal.  The  input  to  the  poewr-ground  phase 
is  a  set  of  rectangular  modules  on  a  rectangular  chip.  There 
is  one  VDD  pad,  one  GND  pad,  and  each  module  has  one 
VDD  terminal,  one  GND  terminal,  and  a  current  requirement. 
The  power-ground  phase  calculates  a  cycle  that  passes  through 
erery  module  once,  dividing  the  VDD  terminals  from  the  GND 
terminals.  This  splits  the  chip  into  a  VDD  region  and  a  GND 
region.  Signal-routing  techniques  find  a  short  Steiner  tree  in  the 
VDD  region  that  connects  the  VDD  terminals  to  the  VDD  pad. 
This  Steiner  tree  consists  of  minimum  width  metal  srires.  The 
same  techniques  route  the  GND  tree.  Using  each  module's  current 
requirement,  tree-traveral  determines  the  current  requirement  and 
required  width  of  each  wire  of  the  trees.  Techniques  used  to  widen 
orercrowded  ehannels  widen  the  wires,  producing  the  final  VDD 
and  GND  trees. 


introduction 

MIT's  Placement-Interconnect  (PI)  Project,  described  by 
Prof.  Ronald  Rivest  at  the  19th  DAC,  is  automating  the  placement 
and  interconnect  phases  of  IC  design  of  custom  NMOS  or  CMOS 
chips  designed  under  the  rules  of  Mead  and  Conway.  Input  to 
PI  consists  of  a  description  of  the  rectangular  modules  and  pads 
to  appear  on  the  chip  and  a  set  of  nets,  where  each  net  is  a 
set  of  terminals  on  modules  and  pads  to  be  connected  by  wire. 
Each  chip  has  one  VDD  net,  one  GND  net,  and  many  signal  nets. 
To  produce  the  description  of  the  routed  chip,  PI  goes  through 
the  following  stages;  placing  the  modules  and  pads  on  the  chip, 
routing  the  VDD  and  GND  nets,  routing  the  signal  nets,  and 
compacting  the  layout  of  the  modules,  pads,  and  wires. 

At  the  end  of  the  placement  phase,  PI  knows  the  entreat 
requirements  of  each  terminal  and  knows  the  position  of  the  one 
VDD  pad,  the  one  GND  pad,  the  signal  pads,  the  modules,  and 
the  terminals. 

This  paper  describes  Pi's  techniques  to  route  the  VDD 
and  GND  nett,  which  is  the  goal  of  Pi's  second  phase,  the 
power-routing  phase.  Power-routing  lays  wires  to  connect  the 
VDD  terminals  to  each  other  and  the  GND  terminals  to  each 
other.  These  wires  form  a  tree  with  its  bate  at  the  pad  and  its 
leaves  at  the  modules'  terminals.  Each  wire  of  the  tree  must  be 
wide  enough  to  carry  the  current  that  might  flow  through  it. 
Drawing  both  trees  in  the  metal  layer  requires  that  they  do  not 
cross. 


We  want  the  total  length  of  the  two  trees  to  be  as  small  as 
possible.  Using  less  of  the  metal  layer  for  power  wires  leaves  more 
metal  for  the  signal-routing  phase,  enhancing  its  performance. 
The  signal-routing  phase  divides  the  chip  area  not  occupied  by 
pads,  modules,  or  power  wires  into  rectangular  regions  called  free 
channels  and  uses  these  free  chanuels  to  determine  where  to  lay 
signal  wires.  Short  simple,  regular  poirar  trees  produce  regular 
free  channels  that  are  more  nearly  square.  Dealing  with  such  free 
channels  enhances  the  signal-router’s  performance. 

History 

2ahir  Syed  and  Abbas  El  Gamal's’  algorithm  grows 
interdigitated  trees.  Applying  "traffic  rules'  to  the  free  channels 
between  modules  prevents  the  trees  from  crossing.  Another  pouible 
approach  routes  one  tree,  minimising  its  length,  and  then  routes 
the  other  tree  without  crossing  the  first.  This  approach  then 
rearranges  separate  brtmehes  of  each  tree,  hoping  to  shorten  the 
second  tree  without  greatly  lengthening  the  first.  Other  apprnerhea 
grow  the  tree  simultaneously.  Rothermel  and  Mlynsld*  grow  one 
tree  from  the  left,  the  other  from  the  right.  Another  approach 
grows,  at  each  step,  the  branch  of  a  tree  that  would  least  hinder 
the  growing  of  the  other  tree. 

Using  a  Hamiltonian  cycle  to  divide  the  chip  into  regions 

Pi’s  power-routing  phase  divides  the  chip  into  a  VDD 
region  and  a  GND  region  and  then  routes  each  net  within  the 
appropriate  region.  To  see  the  relationship  between  the  layout  of 
power  trees  and  a  cycle  that  passes  through  every  module  exactly 
once,  consider  a  chip  with  the  power  trees  already  laid.  Imagine 
standing  on  a  module  with  the  module’s  VDD  terminal  on  your 
right  and  its  GND  terminal  on  your  left.  When  you  try  to  walk 
to  another  module,  keeping  the  VDD  wires  on  your  right  and 
the  GND  wires  on  your  left  will  determine  the  next  module  you 
encounter.  Continuing  the  walk  takes  you  through  every  module 
and  back  to  where  you  started.  A  layout  of  power  trees  thus 
determines  a  Hamiltonian  cycle.  Pi's  power-routing  phase  first 
draw  a  Hamiltonian  cycle  and  lets  this  cycle  determine  the  layout 
of  the  power  trees. 

Several  characteristics  of  the  cycle  are  closely  related  to 
the  quality  of  the  corresponding  tree  layout.  To  ensure  that 
the  chip  is  divided  into  two  regions,  the  cycle  must  not  cross 
itself.  The  two  regions  produced  by  a  shorter  cycle  have  simpler 
shapes,  and  routing  in  such  regions  results  in  better  trees.  These 
considerations  lead  us  to  find  as  short  a  Hamiltonian  cycle  as  can 
be  found  in  a  reasonable  amount  of  time. 

Finding  a  short  Hamiltonian  cycle 

When  using  an  algorithm  to  find  a  short  HamQtcnian  cycle, 
we  must  define  the  distance  from  one  module  to  another,  la 
keeping  with  our  notion  of  traveling  with  the  VDD  wires  on 
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our  righti  w«  imagine  that  the  HamiJtoniaii  cycle  will  leave  a 
module  at.  a  point,  called  an  OUT  point,  on  the  perimeter  of 
the  module  halfway  between  the  terminals.  The  OUT  point  is 
countcrctcckwise  from  the  VDO  terminal  and  clockwise  from  the 
GND  terminal.  Every  module  also  has  an  IN  point  (clockwise  from 
the  VDD  terminal,  counterclockwise  from  the  GND  terminal). 
The  distance  from  Module  A  to  Module  B  is  the  Manhattan 
distance  (chi.iige  in  X'Coordinate  plus  change  in  y-eoordinate) 
from  / 's  OUT  point  to  Be  IN  point.  Note  that  this  definition  ^ 
distan.e  is  not  symmetric. 

There  are  many  algorithms  to  find  a  short  Hamiltoaian  cycle.* 
PI  currently  uses  Shen  Lin's*; 

Start  with  a  nmdom,  directed  Hamiltoniaa  cycle. 

Delete  three  edges. 

Reconnect  the  segments  to  form  a  new  Hamiltonian 
^cle. 

H  the  new  cycle  is  shorter  than  the  original,  look  for 
three  edges  to  delete  from  the  new.  If  not,  lode 
for  another  set  of  three  edges  to  delete  from 
the  originaL  If  trying  all  possible  sets  of  three 
edges  fails  to  prodnee  a  shorter  cycle,  sccept  the 
original  cycle  as  a  leaaonably  short  Hamiltoniaa 
cycle. 

Routing  the  Hamiltonian  cycle 

The  preceding  algorithm  gtvee  the  order  in  which  the  eyele 
traverses  the  modules  bat  does  not  completely  deterfflins  the 
cycle.  We  regard  each  edge  of  the  eyele  as  a  net  of  two  *tanniaals* 
(one  module’s  OUT  point  and  the  other's  IN  point)  and  route 
it  using  signal-touting  techniques.  These  techniques  minimise  the 
wire  lengths  and  the  number  ^  jogs,  which  is  what  m  want,  la 
routing  the  cyels's  edges,  tw  wire  is  allowed  to  etoss  a  wire  of 
a  previously  Uid  edge.  This  ensures  that  the  final  eyele  will  not 
erase  itself. 

We  must  decide  the  order  in  which  we  route  the  segments  at 
the  Hamiltoaian  cycle.  Laying  the  wires  of  one  edge  could  block  a 
path  that  would  provide  a  short  routing  of  a  later  edge.  In  routing 
this  later  edge,  we  may  need  to  lay  long  wires  to  avoid  crotsing 
the  previously  laid  wires.  This  effect  is  more  noticeable  with 
shorter  edges.  For  a  longer  edge,  when  one  module's  OUT  point 
is  far  from  the  other  module's  IN  point,  there  is  a  greater  choke 
of  paths  coimecting  these  points  using  wires  of  approximately  the 
tame  length,  so  cutting  off  one  of  these  paths  is  less  likely  to 
significantly  increase  the  final  length  of  the  tree.  We  therefore 
route  the  edges  of  the  Hamiltoniaa  cycle  is  ascending  order 
according  to  the  length  of  the  edges. 

Routing  the  VDO  and  GND  nets 

When  the  Hamiltoakn  cycle  is  compietely  routed,  all 
terminals  of  one  power  net  will  be  inside  the  cycle,  all  terminal 
of  the  other  power  net  will  be  on  the  outside. 

'We  first  route  the  net  inside  the  Hamiltoaian  cycle.  To  do 
this,  we  use  signaLrouting  techniques  that  find  a  short  Steiner 
tree  that  spaas  the  terminals.  We  restrict  this  routing  so  that  it 
doesn’t  eross  the  HamQtoniaa  cycle. 

We  then  delete  the  Hamiltonian  cycle  and  use  signahroating 
techniques  to  route  the  other  net.  We  restrict  this  routing  so  tiiat  it 
doesn't  eross  the  wires  of  the  previously  laid  tree.  After  this  routing 
has  bean  accomplished,  we  have  two  noncrossing,  interdigitated 
trees  of  minimum  length  wires.  Tbs  following  example  shows  the 
successive  stages  of  input,  drawing  the  Hamitoniaa  cycle,  routing 
the  VDD  tree,  and  erasing  the  Haoiiltaaiaa  cycle  and  rooting  the 
GNDtrea. 
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Determining  wire  width 

Tree  traversal  determines  how  much  current  might  flow 
through  each  wire  of  the  tree.  We  regard  the  layout  of  wires 
of  each  net  as  a  tree  with  its  base  at  the  power  pad  and  its 
leaves  at  the  modules'  terminals.  User  input  gives  each  terminal's 
maximum  current  requirement.  The  maxmom  current  of  a  wire 
ending  in  a  terminal  is  the  maximum  current  of  she  terminaL 
The  in«vi«««iii  current  of  other  wires  is  the  sum  of  its  children's 
maximum  currents.  After  every  wire’s  maximum  current  is  known, 
multiplying  by  a  design-mk  constant  gives  every  wire’s  minimum 
width. 


Mfideaing  the  wires 


Pi’s  stretcher  wideiu  each  wire  at  least  to  its  minimum 
width.  The  stretcher  has  algorithms  to  widen  free  channds  when 
too  many  wires  ate  squeesed  into  too  little  space.  It  uses  these 
same  algorithms  to  widen  the  power  wires. 


We  regard  each  power  wire  as  a  long,  thin  rectangle  and  then 
regard  each  rectangle  as  a  channel.  We  widen  the  channel,  which 
means  stretching  the  channel  in  the  direction  perpendicular  to 
the  current  flow.  IVee  traversal  shows  the  channel  sides  through 
which  current  enters  and  leaves  the  channel.  If  these  sides  are 
opposite  each  other,  the  direction  of  current  flow  is  obvious  and 
we  stretch  only  in  the  direction  perpendicular  to  this.  If  not,  we 
stretch  in  both  directions. 


We  stretch  horisontally  (in  the  x-direetion)  and  then  vertically 
(in  the  y>direction).  Bach  direction  requires  a  separate  sequence 
of  deciding  whether  or  not  this  channel  should  be  stretched  in 
this  direction,  calculating  the  ehaonel's  minimum  width,  and  then 
applying  the  stretching  algorithms. 


The  result  of  stretching  in  both  directions  is  the  final  power 

trees. 


Ralbraaees 


>2abir  A.  Syed  and  Abbas  BI  GamaL  ‘'Sinffle  Layer  Routiag  of 
Power  and  Ground  Networks  in  Integrated  Circuits,*  /oumel  of 
Difital  Systenw,  Viriume  VI,  Number  1,  Spring,  1982,  pages  53-63. 

*H-J.  Rotbermel  and  O.  A.  Mlynsld.  '^^mputation  of  Power 
Supply  Nets  in  VLSI  Layout,*  ACM  IEEE  EishUentk  Destyn 
Aotomation  Conftreitet  PfocttUntt,  1981,  pages  37-47. 

*See  bibliography  in  Nicos  Christofides.  "The  IVavding  Salesman 
Problem,*  Ch^ter  6,  Comiinatorial  Ophmiution,  Wiley  R  Sons, 
1979,  pages  148-149. 


*Sben  Lin.  "Some  Computer  Solutions  of  the  Traveling-Salesman 
Problem,*  BSTJ  U,  2345-2369  (1965). 


58 


Paper  46.6 
755 


Signal  Delay  in  RC  Tree  Networks 

JORGE  RUBINSTEIN,  member,  ieee,  PAUL  PENFIELD,  JR.,  fellow,  ieee, 
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ABirracr-In  MOS  integrated  cnciiits,  signals  may  propagate  between 
stagM  wMi  fanout  The  exact  calculation  of  s^pal  delay  through  such 
networks  is  difficult  However,  upper  and  lower  bounds  for  delay  that 
are  computationaliy  simpls  ate  presented  in  this  paper.  The  results  can 
be  used  1)  to  bound  the  delay,  given  the  signal  threshold,  or  2)  to 
bound  the  signal  voltage,  given  a  delay  time,  or  3)  certify  that  a  ciicuit 
is  “fast  enough,"  given  both  the  maximom  delay  and  the  voltage 
threshold. 

1.  Introduction 

IN  MOS  INTEGRATED  CIRCUITS,  a  given  L.vcrter  or  logic 
node  may  drive  several  gates,  some  of  them  through  long 
wires  whose  distributed  resistance  and  capacitance  may  not  be 
negligible.  There  does  not  seem  to  be  reported  in  the  litera¬ 
ture  any  simple  method  for  estimating  signal  propagation  de¬ 
lay  in  such  circuits,  nor  is  there  any  general  theory  of  the 
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properties  of  RC  trees,  as  distinct  from  RC  lines.  This  paper 
presents  a  computationally  simple  technique  for  finding  upper 
and  lower  bounds  for  the  delay.  The  technique  is  of  impor¬ 
tance  for  VLSI  designs  in  which  the  delay  introduced  by  the 
intercormections  may  be  comparable  to  or  longer  than  active- 
device  delay.  This  can  be  the  case  for  wiring  lengths  as  short 
as  1  mm,  with  4-^  minimum  feature  size.  The  importance 
of  this  technique  grows  as  the  wiring  lengths  increase  or  the 
feature  size  decreases. 

Consider  the  circuit  of  Fig.  1.  The  slowest  transition  (and 
therefore  presumably  the  one  of  most  interest)  occurs  when 
the  driving  inverter  shuts  off  and  its  output  voltage  rises  from 
a  small  value  to  Vdd-  During  this  process,  the  various  parasitic 
capacitances  on  the  output  are  charged  through  the  puUup 
transistor.  Rg.  2  shows  a  simple  model  of  this  circuit  for  tim¬ 
ing  analysis.  The  pullup,  which  is  nonlinear,  is  approximated 
by  a  linear  resistor,  and  the  transition  is  represented  by  a  volt¬ 
age  source  going  from  0  (or  a  low  value)  to  at  time  t  =  0. 
(Later,  for  simplicity,  a  unit  step  will  be  considered  instead.) 
The  polysilicon  lines  are  represented  by  uniform  RC  lines. 
The  resistance  of  the  metal  line  is  neglected,  but  its  parasitic 
capacitance  remains.  Capacitances  associated  with  the  pullup 
source  diffusion,  contact  cuts,  and  the  gates  being  driven  are 
included.  Any  nonlinear  capacitances  are  approximated  by 
linear  ones. 

If  all  the  resistances  except  the  pullup  can  be  neglected,  then 
all  the  capacitors  can  be  lumped  together,  and  the  circuit  re- 
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Fig.  1.  Typical  MOS  dgnaMisttibution  network.  The  inverter  is  shown 
driving  three  gates,  through  a  fanout  network  implemented  in  poly¬ 
silicon  and  metaL 


Fig.  2.  Linear<ircuit  model  for  the  network  of  Fig.  1.  The  voltage 
source  is  a  step  at  time  r  ■  0. 


sponse  may  be  found  in  closed  form.  The  voltages  at  all  the 
outputs  are  the  same: 


(1) 


where  R  is  the  puUup  resistance  and  Cy  the  total  capacitance. 
Thus  at  a  given  time  T,  the  output  voltage  Vout(7')  i$  given  by 
(1),  and  the  time  at  which  o^(f)  reaches  some  specified  criti¬ 
cal  voltage  V(;n  is  given  by 


RCj'  In 


VPD 

Vdd  -  VcR  ■ 


(2) 


However,  if  the  resistances  of  the  lines  are  comparable  to 
that  of  the  puUup,  this  solution  is  not  correct.  The  circuit  re¬ 
sponse  cannot  generally  be  calculated  in  closed  form.  The  re¬ 
sults  below  can  be  used  to  calculate  upper  and  lower  bounds 
to  the  delay  that  are  very  tight  in  the  case  where  most  of  the 
resistance  is  in  the  puilup.  The  theory  as  presented  here  does 
not  explicitly  deal  with  nonlinearities  and  therefore  does  not 
apply  to  signal  propagation  through  pass  transistors. 

Previous  work  on  distributed  RC  circuits  is  summarized  in 
the  extensive  bibliographies  of  Ghausi  and  Kelly  [1]  and 
Kumar  [2] .  There  does  not  appear  to  be  any  treatment  of 
RC  trees,  as  distinct  from  RC  lines,  in  these  bibliographies. 
Perhaps  the  most  complete  treatment  of  the  properties  of  RC 
lines  is  that  of  Protonotarios  and  Wing  [3] ,  [4] ;  some  (but 
not  ail)  of  the  theorems  proved  there  also  apply  to  RC  trees. 
Most  of  the  work  cited  deals  with  techniques  to  approximate 
the  response  of  such  networks,  rather  than  to  find  bounds; 
an  exception  is  that  of  Singhal  and  Vlach  [5],  [6].  An  im¬ 
portant  early  analytical  approximation  to  delay  is  that  by 
Elmore  [7] ,  who  called  the  first  moment  of  the  impulse  re¬ 
sponse  the  delay.  This  definition  is  inadequate  because  it 
does  not  define  delay  in  terms  of  signal  threshold. 

Preliminary,  restricted  versions  of  some  of  the  results  given 
below  have  been  presented  before  by  the  two  senior  authors 


[8] ,  [9] ,  and  utilized  in  at  least  two  working  timing  analyzers 
[10] -[12].  The  junior  author  simplified  the  derivation  and 
tightened  some  of  the  bounds. 

II.  Statement  of  the  Problem 

An  RC  tree,  as  considered  in  this  paper,  is  a  generalization 
of  the  well-known  RC  lines  [3] ,  [4] .  It  may  be  defmed  re¬ 
cursively  as  foUows.  There  are  three  primitive  elements.  First, 
a  lumped  capacitor  between  ground  and  another  node  is  an  RC 
tree.  Second,  a  lumped  resistor  between  two  nonground  nodes 
is  an  RC  tree.  Third,  a  (distributed)  RC  line,  uniform  or  non- 
uniform,  in  the  configuration  with  no  dc  path  to  ground,  is 
an  RC  tree.  Finally,  any  two  RC  trees  with  common  ground, 
and  one  nonground  node  from  each  connected  together,  form 
a  new  RC  tree.  This  definition  does  not  permit  resistor  loops, 
so  that  the  resistors  (including  those  in  the  distributed  RC 
lines)  form  a  topological  tree  that  does  not  include  the  ground 
node.  All  of  the  capacitors  (including  the  distributed  capaci¬ 
tances  in  the  RC  lines)  are  connected  to  ground.  One  of  the 
nonground  nodes  of  the  final  tree  is  assumed  to  be  the  input, 
and  one  or  more  nodes  the  outputs. 

In  many  cases,  each  branch  of  the  tree  except  the  input  ter¬ 
minates  in  an  output;  however,  this  is  not  required,  and  in  this 
paper  the  outputs  may  be  defmed  at  any  of  the  nonground 
nodes. 

As  a  consequence  of  this  definition,  there  is  a  unique  path 
through  the  resistive  part  of  the  network  from  any  nonground 
node  to  the  input. 

For  simplicity,  most  of  the  theory  below  will  be  presented 
for  the  special  case  with  only  lumped  resistors  and  capacitors. 
However,  the  generalization  to  include  distributed  RC  lines 
(uniform  or  nonuniform)  is  straightforward.  All  the  results 
apply  in  the  form  given,  except  that  the  summations  in  the 
formulas  for  7j>,  Tot,  and  Tm  are  replaced  by  a  combination 
of  summations  and  integrals.  The  easiest  way  to  picture  the 
result  is  to  think  of  each  RC  line  as  represented  by  a  fmite 
number  of  lumped  RC  sections,  so  that  the  derivations  apply, 
and  then  consider  the  limit  as  the  number  of  sections  used 
to  represent  each  line  goes  to  infinity.  All  the  summations 
are  well  behaved  in  the  limit.  The  required  integrals  are  given 
explicitly  in  Appendix  A  for  both  uniform  and  nonuniform 
distributed  lines. 

The  RC  tree  representing  the  signal  path  is,  without  loss  of 
generality,  assumed  to  be  driven  at  the  input  with  a  unit  step 
voltage  (henceforth  all  voltages  may  be  thought  of  as  nor¬ 
malized  to  the  magnitude  of  the  step  excitation).  Gradually 
the  voltages  at  all  other  nodes,  and  in  particular  at  all  the 
outputs,  rise  from  0  to  1  V.  It  is  assumed  that  the  output 
voltages  cannot  be  calculated  easily.  The  problem  is  to  find 
simple  upper  and  lower  bounds  for  the  output  voltages,  or, 
equivalently,  to  find  upper  and  lower  bounds  for  the  delay 
associated  with  each  output. 

III.  Analytical  Theory 

Consider  any  output  node  t  (in  this  paper,  i  will  be  used  as 
an  index  selecting  an  output  node)  and  any  lumped  capacitor 
at  node  k  with  capacitance  Cj^.  The  resistance  R^  is  defined 
as  the  resistance  of  thr  portion  of  the  (unique)  path  between 
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Fig.  3.  lUusttation  of  lesistance  tenns.  For  this  netwoilc,  R/ei^Ri  * 
R2t  Rkk  ^  Rl  *  ^2  ^^3,  Rfi*  Ri  +^2+i?5. 

the  input  and  i,  that  is  common  with  the  (unique)  path  be¬ 
tween  the  input  and  node  k.  In  particular,  Rn  is  the  resistance 
between  input  and  output  i,  and  is  the  resistance  between 
the  input  and  node  k.  Hias  R^i^Rkk  andRk{<Ru.  For  an 
example,  see  Fig.  3. 

The  sum  (over  all  the  capacitors  in  the  network) 
Tp^lRkkCk  (3) 

k 

has  the  dimensions  of  time.  Next,  define  for  each  output  i 
two  quantities  that  also  have  the  dimensions  of  time 

Txx  *  S  R-ldCk  (4) 

k 

These  quantities  play  a  role  in  the  final  delay  formulas  but 
none  of  them  is  equal  to  the  delay,  although  Tjn  is  equal  to 
the  first-order  moment  of  the  impulse  response  (see  Appendix 
B),  which  has  been  called  “delay”  by  Elrnore  (7] .  Note  that 
the  network  has  one  value  of  Tp,  but  each  output  of  the  net¬ 
work  has  a  separate  rof  and  Tgi-  It  is  easily  shown  from  the 
definitions  that 

Tiii<Td,<Tp.  (6) 

For  RC  trees  without  side  branches.  Tor  =  Tp.  An  interpreta¬ 
tion  of  Tp  and  Tp/  in  terms  of  the  system  function  of  the  net¬ 
work  appears  in  Appendix  B. 

The  voltage  at  each  output  i  (and  in  fact  at  each  node)  is  a 
monotonic  function  of  time  during  the  transient,  as  proved 
in  Appendix  C.  Also,  the  analog  of  the  well-known  fact  that 
voltage  along  an  RC  line  is  a  concave  function  of  distance 
(suitably  defined)  is  the  following  general  result  (proved  in 
Appendix  D): 

>/?«[! -»/(0].  (7) 

A  similar  result  is  found  by  interchanging  i  and  A;  subscripts 

(8) 

These  results  apply  to  any  output  i  and  any  node  k,  whether 
the  output  is  “upstream”  or  “downstream”  from  the  node  k. 

At  any  instant  of  time,  the  voltage  difference  between  the 
input  and  any  output  /  may  be  calculated  by  summing  the 
voltage  drops  along  the  (unique)  path  between  input  and  out¬ 
put.  Each  such  drop  may  be  expressed  as  the  resistance  times 
the  cunent  feeding  all  “downstream”  capacitors.  Alterna¬ 
tively,  this  double  sum  may  be  expressed  as  a  sum  over  all  ca¬ 
pacitors  in  the  network,  of  the  current  through  each  capacitor 


Fig.  4.  Inteipretatioii  of  /{(r)  as  the  integral  above  the  response  V((r). 
Note  that //(••)  ■  Tjji. 


times  that  portion  of  the  “upstream”  resistance  that  also  hap¬ 
pens  to  lie  along  the  path  to  the  output  /.  This  resistance  is 
what  has  been  deEned  as  Ru,  so 

l-P,(f)=£A?«C*^.  (9) 

Equation  (9)  is  integrated  between  0  and  t,  and  the  result 
denoted /|(f): 


/|(0*  f '  [1  -  0/(r')] 

•'o 

=  L^«C’*0k(r) 

It 

-Ta-ZRkiCk[l-Vk{t)].  (10) 

k 

This  integral  plays  a  central  role  in  the  derivation  of  the 
bounds.  A  graphical  interpretation  appears  in  Fig.  4,  which 
shows  a  typical  step  response.  The  area  above  the  response 
but  below  the  unit  input  is  As  t  approaches  infinity, 
this  approaches  Tjjf. 

If  (7)  and  (8)  are  used  in  (10),  the  result  is 
7r/11  -  p,(0]  < Ta  - m <Tp[\-  11,(0]  (1 1) 

which  is  equivalent  to 

Tp,- m  ^dm  m 

Tp  dt  Tr,  ^ 

which,  when  integrated  between  times  r,  and  yields 

{Tg-fdty)]  [7b,-/,(fj)] 

<irz„-/,(f,)]e'^'*‘'*^'’'^  (13) 

Since  Vfif)  is  monotonic  nondecreasing 

(U  “  [1  -  e,(f4)]  M-i)  (14) 

for  any  nonnegative  and  t^- 
The  voltage  bounds  are  now  easily  derived.  Of  course 

u/(0>0  (15) 

but,  in  addition,  from  (1 1)  and  (14)  with  fs  *  0  and 


f  T'/w 


(16) 
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Rg.  S.  Focm  of  the  bounds,  with  the  distances  from  the  exact  solution 
exaggerated  for  clarity. 

and,  from  the  first  inequality  in  (1 1),  (14)  with  ti  =  t-Tp  + 
^Ri  and  ta  «  f,  and  the  second  inequality  in  (13)  with  f|  »  0 
and  f]  ”  ^3 

vt(0>  1  -  Wp^-r/T, 

which  holds  only  for  t>Tp-  Tpi.  The  best  lower  bound  is 
(1 5)  for  f  <  Tot  -  Tpt<  (16)  for  Tot  ~  Tpt  <tKTp-  Tpf  sod 
(17)  for  Tp  -  Tut  <  t.  The  upper  bounds  on  voltage  are,  from 
(1 1)  and  (14)  with  fa  ^  r  and 

oi(f)<l-^^  (18) 

Tp 

and,  from  the  second  inequality  in  (1 1),  the  fust  inequality  in 
(13)  with  fi  * T/x  *  Tpt  and  f j  =  f,  and  (14)  and  t^^Tot- 
Tpt  and  f 4  ■  0 

(19) 

Tp 

which  holds  only  for  f  >  Tot  ~  Tpf  The  best  upper  bound  for 
voltage  is  (1 8)  for  f  <  Tot  -  Tpt  and  (19)  for  Tot  ~  Tpt  <  f- 
Bounds  for  the  time,  given  the  voltage,  are  possible  because 
the  voltage  is  a  monotonic  function  of  time.  Of  course 


f>0  (20) 

and  in  addition,  (18)  and  (19)  can  be  inverted  to  yield 

t>Tot-Tp[\-Mt)\  (21) 

and  (16)  and  (17)  yield 


where  (22)  applies  only  if  v/(t)>  1  -  TptlTp.  and  (24)  only 
if  P|(r)>  1  -  TotITp.  The  general  form  of  all  these  bounds  is 
illustrated  in  Fig.  S. 

These  bounds,  (15)-(19)  for  voltage,  and  (20)-(24)  for 
time,  constitute  the  major  result  of  this  paper. 
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Fig.  6.  Elements  sufficient  for  describing  RC  trees.  For  simplicity, 
only  uniform  distributed  RC  lines  are  included.  The  parameters  are 
C4P.  R£S,  and  N,  and  the  functions  which  return  networics  are  C, 
OUTPUT,  R,  and  URC.  The  capacitor  and  output  designation  are 
one^port  elements,  and  the  resistot  and  uniform  line  are  two^ort 
elements 


IV.  Practical  Hierarchical  Algorithms 

Use  of  hierarchy  is  a  powerful  way  to  deal  with  complexity 
in  design  of  large  systems.  Computation  cost  is  usually  less 
with  hierarchical  algorithms,  and  analysb  of  part  of  a  design 
can  be  done  before  the  rest  is  known.  In  this  section,  pro¬ 
grams  are  given  for  calculating  the  voltage  and  time  bounds 
of  this  paper  hierarchically.  Although  intended  for  exposition, 
these  programs  are  complete  and  do  work.  They  may  be  used 
interactively,  without  any  changes  whatever,  for  small  or  mod¬ 
erate  size  networks,  or,  for  large  networks,  they  may  be  in¬ 
corporated  into  systems  that  deal  with  machine-readable  net¬ 
work  descriptions. 

One  way  to  use  the  inequalities  of  this  paper  is  to  consider 
the  overall  RC  tree,  and  compute  for  each  capacitor  the  ap¬ 
propriate  Ricf  and  Rhic  so  that  Tp,  Tot,  and  Tpt  hr  each  out¬ 
put  can  be  found.  Of  course,  for  networks  with  distributed 
lines,  the  sums  are  augmented  with  integrals  as  discussed  in 
Appendix  A.  In  this  approach,  the  calculations  for  each  out¬ 
put  require  time  proportional  to  the  square  of  the  number  of 
elements. 

An  alternate  scheme  is  to  build  up  the  network  by  con¬ 
struction,  and  calculate  independently  for  each  of  the  partially 
constructed  networks  enough  information  to  permit  the  final 
calculation  of  Tp,  Tot,  and  Tpf  A  recursive  definition  of  RC 
trees  is  given  below,  and  if  the  network  is  expressed  in  these 
terms  rather  than  in  the  form  of  a  schematic  diagram,  the  re¬ 
sulting  expression  can  be  used  as  a  guide  for  the  calculations. 
The  computation  time  for  each  output  is  proportional  to  the 
number  of  elements,  rather  than  the  square  of  the  number. 
Programs  that  implement  this  approach  appear  below. 

Fig.  6  shows  the  four  building  blocks:  lumped  capacitor, 
lumped  resistor,  uniform  RC  line,  and  declaration  of  output. 
The  capacitor  and  the  output  label  are  considered  as  two- 
terminal,  or  one-port  networks.  The  RC  line  and  the  resistor 
are  considered  as  two-port  networks.  If  desired,  particular 
nonuniform  RC  lines,  such  as  exponentially  or  linearly  tapered 
lines,  can  be  included  also.  Fig.  7  shows  the  five  permissible 
ways  of  wiring  these  building  blocks,  or  previously  wired  sub¬ 
networks,  together.  Any  RC  tree  can  be  denoted  by  an  ex¬ 
pression  using  only  these  wiring  functions.  The  syntax  shown 
is  identical  to  APL  syntax,  and  the  programs  below  are  written 
in  APL  Note  that  Figs.  6  and  7  do  not  give  a  minimal  set  of 
elements  or  wiring  func  ms,  since  some  can  be  expressed  in 
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*  P  B  *  WT  B  WTO  A 

T  -W- 

Ik  A  wc  B 

Fig.  7.  Wiring  functions  for  interconnecting  elements  or  subtrees.  The 
functions  which  return  one-port  networks  are  P,  WT,  and  WTO,  and 
those  that  return  two-port  networks  are  WP  and  WC.  Here  A  and  B 
are  previously  defined  JIC  trees. 


ts  I  3.4 

“1"  I  i-’” 

Fig.  8.  Example  network.  Parameter  values  are  in  ohms  and  farads. 
The  characteristic  times  (in  sec.)  are  ras419,  Tm  ‘■386,  Tot  ■ 
307.7.  Tdi2  -  363.  and  Tru  -  333.2. 


terms  of  others.  The  names  for  the  wiring  functions  are  taken 
from  the  notation  of  the  program  MARTHA  [13]  -[16] . 

Example;  The  network  shown  in  Fig.  8  may  be  denoted 

(R  15)  WT{C2)PiiR  8)  WT((01)P OUTPUTS) 

PiiURCS  4)  WC  WPC9)WT  OUTPUT  12  (25) 

and  is  a  one-port  network,  with  two  declared  outputs. 

For  convenience,  this  notation  allows  a  network  with  only 
one  output  to  be  expressed  as  a  two-port  with  the  second  port 
an  implicit  output,  without  any  explicit  output  declaration. 
The  explicit  declaration  of  outputs  is  handy  because  often 
side  branches  do  not  represent  outputs  of  interest. 

If  an  expression  such  as  (25)  is  to  be  used  as  a  guide  for 
the  calculations,  then  each  function  shown  must  correspond 
to  the  calculation  of  partial  results  which  are  sufficient  to 
allow  further  calculations.  The  following  information  is  ade¬ 
quate  at  each  stage  in  the  construction  of  the  network: 

(1)  Total  capacitance  Ct-. 

(2)  Tp  of  the  network  as  constructed  so  far. 

(3)  For  a  two-port,  considering  port  2  as  an  implicit  output, 

•^22.  ^02*  ^R2-  (f'of  convenience,  the  product 

R22Tr2  is  used  in  the  programs  below  instead  of  Tr2.) 

(4)  For  each  declared  output  in  a  one-port  or  two-port  net¬ 
work,  Rii,  Toh  “d  Tri-  (For  convenience,  R^Tr/  is 
used  rather  than  Tri.) 

(5)  For  each  declared  output  in  a  two-port  network, 

Each  of  the  quantities  identified  above  pertains  to  the  particu¬ 
lar  subnetwork  and  can  be  calculated  from  a  knowledge  of 
that  subnetwork  alone,  independent  of  how  the  subnetwork 
may  later  be  wired  together  with  other  subnetworks.  As  an 
example  of  the  use  of  these  quantities  during  construction  of 
the  network,  consider  the  cascade  operation  WC.  The  objec¬ 
tive  is  to  find  Cf,  Tp,  Rjj,  Toj,  Tr2,  and  all  Ru.  Tpi,  Tri, 
and  R^i  of  the  cascade  ,4  WCB  from  the  corresponding  quanii- 
ties  for  its  two  arguments  A  and  B.  The  formulas  for  calcu¬ 
lating  these  are 


*  fC  CAP 

Cl]  *’eAPAcnMKf  maar  i—cap*,cap 

[2]  8*l.CilP.O 
» 

»  z*oonoi  w 

Cl]  t'ourm  LABSL'  esMKir 
t2]  Z*  1  0  0  0  0  0 

» 

T  z»a  ass 

Cl]  ••KEszsTAKcr  tuttiar  i<‘tias*,iiss 
C2]  z*  2  0  0  .ass.  0  0 
» 

T  l*OK  KiBKSxCAP 
Ill  *>aEs.  CAP'  sasoara  2»«pc-jic 

t2]  KSS-HK 
(31  CAP*l*ltC 

(4]  z»2.aia,(CiiP«ass«2>.ass,(Ciia«aBS«2),c4avaBs>ais»3 


Fig.  9.  APL  functions  for  the  elements. 

Ct  ■*  Cta  +  Ctb  (26) 

®  TpA  +  TpB  •^^RtiaCtb  (27) 

/?22  *^22A  ■*'^22B  (28) 

2*D2  “  'Td2A  2*D2B  ■'■^22X^78  (29) 

2r2^22  “  Tr2A^22A  *  Tr2bR22B 

* 2R22A'^D2a  * R\zaCtb  (30) 

*  ^UA  .  ^UB  *R21A  (31 ) 

"^Ot  *  (7dm  *^2uCTB)k  Td(B 

*R22aCtB  +  Td2A  (32) 

T'ri^u  *  {Tru^ua  +  T^Im  Crs),  TrirRub 

782a7?22A  27? j2A  7*018 (33) 
^2i  “  7?2m  ,  Rub  +  7?22A  •  (34) 


The  corresponding  formulas  for  the  other  wiring  functions  are 
similar,  but  not  as  complicated. 

A  set  of  APL  functions  which  implement  this  scheme  appear 
in  Figs.  9-12.  The  necessary  data  is  passed  around  in  the  form 
of  vectors.  A  one-port  network  is  represented  by  the  vector  1 , 
Cy,  Tp,  followed  by  zero  or  more  sets  of  four  numbers,  Nf, 
7?«.  7*o<,  TriRii.  The  number  1  starting  off  the  vector  is  the 
number  of  ports,  and  Ni  is  the  (numerical)  label  for  each  out¬ 
put.  It  is  not  necessary  to  pass  along  the  number  of  declared 
outputs  since  that  can  be  calculated  from  the  length  of  the 
vector.  In  a  similar  way,  a  two-port  network  is  represented  by 
the  vector  2,  Cp,  Tp,  R22,  7*02,  7*827? 22.  followed  by  zero 
or  more  sets  of  five  numbers,  Af,,  7?,/,  Tpi,  TriRu,  and  T?,,, 
one  set  for  each  declared  output. 

The  background  functions  in  Fig.  12  provide  1)  some  error 
control  (with  automatic  abort  in  case  of  error),  2)  calculation 
of  the  number  of  ports  of  a  network,  0  being  returned  for  ill- 
formed  arguments,  3)  a  matrix  with  the  data  for  the  declared 
outputs,  and  4)  extraction  of  Tp,  Tqi,  and  Tri.  The  four  ele¬ 
ments  appear  in  Fig.  9,  and  the  wiring  functions  in  Fig.  10. 
The  listing  of  WC,  for  example,  shows  after  error  checking,  the 
calculation  of  the  required  output,  term  by  term,  from  the 
arguments.  This  function  can  be  compared  with  (26)-(34). 

Fig.  1 1  shows  five  functions  intended  to  calculate  the  bounds 
for  any  network.  The  convention  followed  is  that  if  the  argu¬ 
ment  for  any  of  these  is  a  two-port  network,  the  second  port 
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Fig.  10.  APL  functions  for  the  wiring  functions. 
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Fig.  1 1.  Response  functions.  The  vaty  small  numbers  in  the  functions 
guard  against  errors  for  pathological  networks  and  certain  limiting 
values  for  voltage  and  time. 


is  taken  as  the  desired  output,  and  the  declared  outputs  are 
ignored.  If  the  argument  is  a  one-port  network,  then  the  de¬ 
clared  outputs  are  used.  The  two  functions  TMIN  and  TMAX 
calculate  the  lower  and  upper  bounds  for  delay,  and  refer  to  a 
global  variable  named  V  which  contains  the  threshold,  a  num¬ 
ber  (or  array  of  numbers)  between  0  and  1 .  The  functions 
VMIN  and  VMAX  calculate  the  lower  and  upper  bounds  for 
signal  voltage  and  refer  to  a  global  variable  T  containing  an  ar- 
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:ig.  12.  APL  background  funL.ions  to  support  the  functions  in  Figs.  9, 
10,  and  11. 
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Fig.  13.  Example  of  the  use  of  the  fast  calculation  scheme  to  find  up¬ 
per  and  lower  bounds  on  delay  and  response  voltage. 


Fig.  14.  Upper  and  lower  bounds  for  output  5,  as  calculated  in  Fig.  13. 

The  exact  solution,  found  from  circuit  simulation,  is  shown  also. 

ray  of  (positive)  delay  times.  The  final  function,  OK,  refers  to 
both  y  and  T  and  returns  1  if  all  is  well,  that  is,  if  TMAX  <  T, 
or  - 1  if  the  network  definitely  will  fail,  that  is  if  T  <  TMIN, 
or  0  if  the  bounds  are  not  tight  enough  to  tell  for  sure,  that  is 
if  TMIN  <T<TMAX.  An  example  of  the  use  of  these  func¬ 
tions  to  test  the  network  in  Fig.  8  is  shown  in  Figs.  13  and  14. 

V.  Application  to  PLA  Speed  Estimates 
These  bounds  are  applied,  as  an  example,  to  polysilicon  lines 
driving  the  and  plane  of  a  PLA,  to  determine  whether  or  not 
the  dominant  delay  occurs  here.  It  is  assumed  that  a  strong 
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Fig.  IS.  APL  function  which  returns  a  model  of  a  PLA  line  with  N 
min  terms. 


Fig.  16.  Upper  and  lower  bounds  on  response  time  of  the  network  of 

Fig.  IS,  shown  as  a  function  of  the  number  of  minterms  in  the  PLA 

superbuffer  driver  drives  the  line,  and  that  every  other  min- 
term  has  a  transistor  present.  The  gates  are  assumed  to  be  4- 
microns  square,  separated  by  24  ixm  of  RC  line.  The  poly 
resistance  is  assumed  to  be  30-^2  per  square,  the  gate  oxide 
thickness  400  A,  and  the  field-oxide  thickness  3000  A. 

These  numbers  lead  to  a  capacitance  of  0.01  pF  and  resis¬ 
tance  1 80  n  between  gates,  and  a  resistance  of  30  £2  and  ca¬ 
pacitance  of  0.013  pF  for  each  gate.  The  network  is  driven 
by  a  source  resistance  of  380  £2  and  the  effective  capacitance 
of  the  output  of  the  driver  is  estimated  as  0.04  pF. 

A  function  which  returns  a  network  with  N  minterms  is 
shown  in  Fig.  IS.  The  results  of  calculating  the  delay  as  a 
function  of  the  number  of  minterms  are  shown  in  Fig.  1 6.  The 
voltage  threshold  was  taken  to  be  0.7  times  Vdd-  On  this  log- 
log  plot  the  quadratic  dependence  of  delay  on  number  of  min¬ 
terms  (as  a  measure  of  the  length  of  the  line)  is  evident.  Also 
evident  is  the  fact  that  even  with  as  many  as  a  hundred  min¬ 
terms,  the  delay  is  guaranteed  to  be  no  worse  than  1 0  ns.  This 
suggests  that  the  dominant  delay  in  a  PLA  occurs  elsewhere. 

VI.  Conclusions 

A  computationally  efficient  method  for  calculating  the  sig¬ 
nal  delay  through  MOS  interconnect  lines  with  fanout  has 
been  described.  Tight  upper  and  lower  bounds  for  the  step 
response  of  RC  trees  have  been  presented,  together  with  linear¬ 
time  algorithms  for  these  bounds  from  an  algebraic  description 
of  the  tree.  Substantial  computational  simplicity  is  achieved 
even  in  the  presence  of  RC  distributed  lines  by  representing 
the  RC  tree  by  a  small  set  of  suitably  defined  characteristic 
times,  which  can  be  calculated  easily  and  used  to  generate 
the  bounds. 

Although  only  the  step  response  is  considered  here,  the  re¬ 
sults  can  be  extended  to  upper  and  lower  bounds  for  arbitrary 
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excitation  by  use  of  the  superposition  integral.  This  extension 
is  discussed  in  Appendix  E.  An  example  of  this  calculation  ap¬ 
pears  in  [9] . 

Extensions  of  the  theory  to  RC  trees  with  nonlinear  ele¬ 
ments  (similar  to  the  work  of  Glasser  [17]  for  nonlinear  MOS 
inverters)  would  be  desirable  for  better  modeling  of  MOS  cir¬ 
cuits.  Investigations  of  RC  trees  with  nonlinear  capacitors  and 
resistors  are  now  under  way,  along  with  attempts  to  unify  the 
modeling  of  gates  and  interconnects,  and  in  particular  to  in¬ 
clude  pass  transistors  in  the  interconnects.  Tighter  bounds  are 
also  being  looked  for. 


second  and  third  sums  in  (38)  are  over  only  these  dbtributed 
lines  which  lie  along  the  (unique)  path  between  the  input  and 
output  i. 

Appendix  B 

From  (3)  it  is  evident  that  Tp  is  equal  to  the  sum  of  all  the 
open-circuit  time  constants  of  the  network,  a  quantity  that  is 
well  known  in  the  analysis  of  multistage  amplifiers,  and  that 
has  been  shown  to  be  equal  to  the  negative  of  the  sum  of  the 
inverse  of  all  the  transmission  poles  [18].  That  is,  if  the  nor¬ 
malized  system  function  /f((s)  for  the  output  i  is 


Appendix  A 

The  results  of  this  paper  are  valid  for  RC  trees  that  contain 
distributed  RC  lines.  Ail  results  apply  without  change,  except 
that  the  defmitions  of  Toh  3ii<i  Tm  in  (3)-(S)  are  replaced 
by  (36)-(38)  below. 

The  summations  in  (3)-(S)  are  for  the  case  of  lumped  ca¬ 
pacitors  only,  and  the  index  k  runs  over  all  lumped  capacitors 
in  the  network.  The  form  for  networks  with  distributed  RC 
lines  is  similar;  the  index  k  runs  over  both  lumped  capacitors 
and  RC  lines.  The  terms  for  lumped  capacitors  are  unchanged 
from  (3)-(S),  but  for  distributed  lines  additional  terms  appear 
in  (36)-(38). 

Each  line  it  has  a  total  capacitance  C^  and  appears  in  the  net¬ 
work  with  one  end  (say  the  left-hand  end)  nearer  the  input  of 
the  network.  Along  the  line,  the  capacitance  is  distributed, 
but  the  cumulative  capacitance  c  is  a  function  of  position, 
and  has  a  value  between  0  (at  the  left  end)  and  Q  (at  the  right 
end).  For  each  value  of  c,  there  is  a  value  of  cumulative  resis¬ 
tance  r(c)  monotonically  increasing  with  c; r(0)  0  and  rfC^) 
is  the  total  resistance  of  the  line.  For  uniform  lines,  r(c)  is  a 
linear  function,  and  for  nonuniform  lines  r(c)  has  other  shapes. 
Define  the  series  of  integrals 

Kc)]"dc.  (35) 

•'0 

Note  that  if  the  line  k  is  interpreted  as  a  simple  RC  tree  with¬ 
out  any  additional  elements,  then  its  7>  and  Tp  are  and 
its  Tp  is  l^^lriCk)-  For  a  uniform  line, 7**^  =  '■(Q)  Ck/2  and 
/r  =  [KCk)]*Q/3. 

For  each  distributed  line  k  let  Ri^^  be  the  resistance  between 
the  left-hand  end  of  the  line  and  the  input  of  the  network,  and 
Rm  be  the  portion  of  that  resistance  that  also  lies  on  the 
(unique)  path  between  the  input  and  any  output  i.  Then  the 
expressions  for  Tp,  Tpt,  and  Tpi  are 


*  k 

(36) 

Jlr  * 

(37) 

^RI^^L  ^klCk  2  ^ 

(38) 

where  the  first  sum  in  all  three  expressions  is  over  both  lumped 
capacitors  and  distributed  lines;  the  second  sum  in  (36)  is 
over  all  distributed  lines;  and  the  second  sum  in  (37)  and  the 

H,(s) 


1  *'biS*b2S^  +  “  ■ 
1  +ajs  +  jjS*  +  ■  •  • 


(39) 


thenTp^aj.  Also,  it  can  be  shown  that  To/® Oj  -  bj.  To 
prove  this,  one  starts  from  the  equality  [19]  between  ai  -  bj 
and  the  first-order  time  moment  Jq  hf(t)tdt  of  the  impulse 
response  hf(t).  Integrating  by  parts,  one  can  show  that 


h,(t)  t 


•'ft 


[1  -  e,(f)]  dt. 


and  therefore  is  equal  to  Tpt,  as  given  by  (10).  This 

completes  the  proof,  and  shows  that  Tpt  is  equal  to  the  first- 
order  time-moment  of  the  impulse  response. 


Appendix  C 

It  is  proved  here  that  when  a  linear  TJCtree  is  excited  with  a 
step  input  from  an  initial  rest  condition,  the  voltage  at  each 
node  is  a  monotonic  function  of  time.  This  condition  is  iden¬ 
tical  to  the  condition  that  the  impulse  response  of  the  same 
network  is  nonnegative.  The  proof  is  to  assume  that,  with  an 
impulse  applied  at  the  input,  the  voltage  on  one  or  more  nodes 
is,  at  some  instant  of  time,  negative,  and  then  show  that  this 
assumption  leads  to  a  contradiction. 

It  is  assumed  that  distributed  RC  lines  in  the  tree  can  be  re¬ 
placed  by  finite  lumped  ladder  approximations  with  arbitrarily 
close  impulse  responses,  so  that  if  one  or  more  nodes  of  the 
original  network  has  a  negative  voltage,  then  so  does  one  or 
more  nodes  of  an  approximate  lumped  network.  For  the  re¬ 
mainder  of  this  appendix,  this  lumped  network  is  dealt  with. 

At  time  r  =  0-<-,  the  voltages  on  all  nodes  are  nonnegative. 
Let  timin(^)  denote  the  lowest  voltage  of  any  node  in  the  net¬ 
work  at  time  t.  Assume  that  at  some  time  Iq  ^  0>  eminC^o) 

0.  Then  there  must  be  some  prior  time  f)  when  Cniin  ^nd  its 
derivative  with  respect  to  time  are  both  negative. 

At  time  r| ,  UminCf  i )  is  achieved  by  at  least  one  node.  If  it 
is  achieved  by  more  than  one,  at  least  one  must  have  a  negative 
derivative.  This  node  is  characterized  by  having  the  lowest 
voltage  in  the  network,  and  also  a  negative  derivative.  The  net 
current  flowing  into  this  node  from  other  nodes  in  the  RC  tree 
is  nonnegative,  because  the  adjacent  nodes,  to  which  this  node 
is  directly  connected  through  resistors,  are  not  at  a  lower  prv 
tential.  This  net  current  must  flow  into  the  capacitor  at  that 
node,  and  therefore  the  rate  of  change  of  the  node  voltages  is 
nonnegative.  This  contradiction  proves  the  impossibilits’  of 
the  assumption  above  that  a  node  voltage  is  negative. 
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Appendix  D 

Equation  (7)  is  to  be  proved.  Note  Erst,  for  any  three  nodes 
t,  /,  and  k  in  the  network,  that  is  at  least  as  large  as  both 
Rhi  and  R/f.  Also,  Rf^  is  at  least  as  large  as  the  lesser  of  Rf^t 
axiA  Rjf.  Thus 

(40) 

Now  note  that,  similar  to  (9) 

(41) 

/  at 

=  (42) 

so  that,  because  of  (40)  and  the  fact  that  Vj(t)  is  monotonic 

f?H[l  “  Ok(0]  “  ^«(1  ~  Oi(0] 

®  2  {Rii^/k  ~  -Rfe/f?//)  Cj  — 

>0  (43) 

which  immediately  implies  (7). 

Appendix  E 

Bot.>ds  for  the  response  yi(t)  of  an  RC  tree  to  an  arbitrary 
excitation  x(f)  can  be  obtained  from  the  upper  and  lower 
bounds  Du((f)  ^iiit)  derived  for  the  unit  step  response 
o,(r). 

First,  the  superposition  integral  can  be  used  to  obtain  yi(t) 
as 


MO-J  Oiit-t)  dt 

=  Vi(t)  *  dxidt 

(44) 

where  *  denotes  time  convolution.  From 

«'«(0<b<(0<"u/(0 

(45) 

one  obtains 

»„(f)  •  dxldt<yt(t)  <  o„/(r)  *  dx/dt. 

dxidt  >  0 

(46) 

p„i(t)  •  dxidt  <yi(t)  <  ou(t)  •  dxidt. 

dxidt  <0 

(47) 

where  v„f(t)  and  P|r(r)  are  known  analytically.  From 

(46)  it 

can  be  seen  that  bounds  for  the  ramp  response  can  be  obtained 
simply  by  integrating  the  unit  step  bounds.  Equations  (46)  and 
(47)  apply  for  monotonic  inputs. 

For  the  general  case  where  the  excitation  x(r)  has  both  posi¬ 
tive  and  negative  slopes,  one  can  define  the  following  functions; 

i‘>u/(t-t'),  dxldt’>0 
Du(t-t’),  dxldt'<0 
0,  dxidt’  -  0  (48) 


!vu(t-t'),  dxidt' >0 

Out(t-t'),  dxidt’ <0 
0,  dxidt' =0.  (49) 

The  response  yj(r)  is  then  bounded  by 

J*  *'MiN/(t.  ^dt'  <y/(0 

<f^»MAXiit.t')^dt'.  (50) 
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ABSTRACT 


The  waveform  bounding  approach  to  fast  timing  analysis  of 
MOS  VLSI  circuits  is  discussed.  The  idea  is  to  compute 
rigorous  closed-form  expressions  giving  upper  and  lower 
bounds  for  transient  voltage  waveforms,  rather  than  exact 
values.  The  goal  is  to  enable  rapid  computation  without 
sacrificing  user  confidence  in  the  results. 

I.  Backoround  and  Objectives 


Existing  approaches  to  timing  analysis  and  simu¬ 
lation  of  digital  integrated  circuits  fall,  roughly 
speaking,  into  three  classes : 

1)  Methods  such  as  SPICE  2  [1]  and  ASTAP  [2],  based 
on  essentially  exact  numerical  solutions  of  the  network's 
differential  equations,  are  accurate  and  reliable,  but 
cuite  slow  in  terns  of  the  needs  of  tne  VLSI  era. 

ii)  Soecialized  MOS  timing  simulators  like  MOTIS-C 

[3]  and  SPLICE  [4]  rely  on  table  lookup  of  device 
cnaracteristics  for  speed,  and  save  additional  time 
by  terminating  a  Newton-Raphson  or  similar  interation 
before  convergence  is  reached.  SPLICE  is  in  addition 
a  mixed-mode  circuit,  timing  and  logic  simulator  and 
uses  a  selective  trace  algorithm  to  exploit  latency. 

In  botn  these  programs  the  termination  of  an  iterative 
steo  orior  to  convergence  saves  time  at  the  cost  of 
accuracy  and,  in  some  instances,  of  numerical 
stability  [S].  The  improvement  in  Speed  over  SPICE  2 
is  typically  one  to  two  orders  of  magnitude  for  SPLICE 

[4]  and  about  two  orders  of  magnitude  for  MOTIS-C  [6]. 

ill)  More  recently,  some  researchers  are 
exploring  an  alternate  approach  to  timing  analysis  and 
simulation  based  on  a  radically  simplified  electrical 
description  of  the  hetwork.  RSIM  [7],  CRYSTAL  [8], 
and  TV  [9]  fall  at  the  far  end  of  the  speed-accuracy 
tradeoff  curve  from  SPICE  2.  A  MOSFET  is  typically 
represented  in  these  programs  by  an  extremely  simpli¬ 
fied  model:  a  linear  resistor  in  scries  with  a  switch. 

And  a  polysilicon  or  diffusion  lino  is  represented  by 
a  lumped  caoacitance  in  RSIM,  or  by  a  delay  in  CRYSTAL 
and  TV  obtained  by  simply  averaging  tne  upper  and 
lower  delay  bounds  obtained  by  Rubinstein,  Penfield, 
and  Horowitz  [10].  These  programs  are  potentially 
very  fast  and  have  a  number  of  attractive  user-oriented 
features.  The  drawback,  of  course,  is  that  there  are 
no  aosolute  known  limits  to  the  error  in  their  total 
delay  estimates.  The  user  can  never  be  sure  the 
answers  they  give  are  close  enough. 

The  objective  of  the  waveform  bounding  approach 
to  timing  analysis  and  simulation  is  to  combine  the 
computational  speed  that  results  from  avoiding  the 
numerical  solution  of  differential  equations  with 
the  user  confidence  in  the  result  that  comes  from 
rigorous  error  bounds.  Our  attack  on  the  timing 
analysis  problem  is  based  on  a  careful  fundamental 
study  of  the  differential  equations  describing  the 
dynamics  of  gates,  pass  transistors,  interconnect, 
and  tne  standard  digital  circuits  constructed  from 
them. 


analysis  at  Stanford. 

II.  Response  Bounds  for  Interconnect 
2.1)  Linear  Interconnect  Models 

This  section  siainiarizes  the  results  obtained  in 
[10].  In  this  work  an  MOS  signal  distribution  network 
as  shown  in  Fig.  1  is  modelled  as  a  branched  linear 
RC  line,  i.e.,  an  RC  tree,  as  in  Fig.  2. 


Figure  1.  Typical  .MOS  signal-distribution  network. 

The  Inverter  is  shown  driving  three  gates. 


Figure  2.  The  linear  RC  tree  shown  above  is  a  model  for 
the  network  of  Fig.  1.  The  voltage  source 
is  a  unit  step  at  time  t  ■  0. 

For  any  two  nodes  in  the  network,  R{,in  is  defined  as  the 
sun  of  the  resistances  along  the  route  consisting  of 
the  Intersection  of  the  path  from  the  input  to  node  t 
with  the  path  from  the  input  to  node  m,  as  illustrated 
in  Fig.  3.  The  three  time 


The  MIT  group  currently  working  on  this  project 
includes  Profs.  Paul  Penfield,  John  Wyatt,  and  Lance 
Glasser  and  graduate  students  Charles  Zukowski  and 
Paul  Bassett.  In  addition,  Mark  Horowitz  [10,  U]  is 
currently  completing  a  dissertation  on  MOS  timing 
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2.2)  WonUnwrltiti  Afftctino  Interconnect 

The  linear  circuit  model  In  Fig.  2  falls  to  Incor¬ 
porate  three  types  of  nonllnearltles  present  In  Fig.  1 
or  related  circuits;  the  nonlinear  output  resistance 
of  the  Inverter,  the  nonlinear  gate-to*channe1  capaci¬ 
tance  of  the  MOSFET  loads,  and  the  nonlinear  capacitance 
from  any  diffusion  line  to  substrate.  This  section 
describes  recent  work  [13*16]  that  allows  the  bounds 
for  linear  networks  [10]  to  be  applied  to  RC  lines 
Incorporating  such  nonllnearltles.  (Further  research 
Is  needed  for  branched  lines,  I.e.  RC  trees.) 


Figure  3. 


Illustration  of  resistance  terms, 
network,  Rj^^  *  R^  *  Rj.  R|j|j  * 
and  R^.  .  R,  +  Rj  ♦  Rg. 


For  this 
Rj  *  Rg* 


constants  used  to  derive  response  bounds  are 


Tp 

(1) 

^D1 

(2) 

^R1 

‘T  (rJ,  Cj1/r„  . 

(3) 

where  the  summations  are  taken  over  all  nodes  of  the 
network. _  The  derivation  In  [10]  shows  that  v^(t)  ^ 
Vi(t)  i7i(t),  for  all  t  >  0,  where  v^(t)  1s~the  ac^tual 
zero  state  step  response  at  any  terminal  node  1 ,  and 
tne  bounds  ^i(t]  and  v^{t)  are  given  by 


'R1 


0*  0  1  t  *  Toi  * 

-“V'ki  '*> 

1  -  ^  «xp[(Tp-T,j-t)/Tp].  T,-T,J  i  t 


^1 


Figure  4.  Form  of  the  bounds,  with  the  distance  from 
the  exact  solution  exaggerated  for  clarity. 


The  time  required  to  compute  these  bounds  grms  only 
linearly  with  the  number  of  elements  In  the  network. 
Recent  applications  of  this  result  Include  [3,  9,  12]. 
The  ultimate  goal  of  this  portion  of  the  project  Is 
to  derive  a  hierarchy  of  such  bounds,  permitting  the 
user  to  trade  off  accuracy  for  computation  time. 


Using  the  notation  and  sign  conventions  illustrated 
In  Fig.  5,  the 


Figure  S.  Two-capacItor  example  of  a  nonlinear,  non 
uniform  RC  line. 


state  equations  for  any  nonuniform,  nonlinear  lumped 
RC  line  with  N  capacitors  can  be  written  in  the  form 

1  i  J  <  M,  (6) 

where  giQ  »  0,  v^^.^  ^e,  and  the  capacitor  constitu¬ 
tive  relations  qj  «  hWvj)  are  continuously 
differentiable  with  Cj  (vj)  *•  hf  (vj)  >  0  everywhere. 

Ue  assiane  the  resistor  curves  are  continuously 
differentiable,  strictly  Increasing,  and  pass  through 
the  origin. 

The  state  space  for  (61  Is  the  set  of  all  vectors 
of  capacitor  voltages  v  e  R".  The  four  subsets  of 
the  state  space  defined  below  play  a  key  role  in  the 
theoretical  development  [15]. 

Def.  1 

P  A-  (V  e  r'  IVj^O,  1J<N}  (7) 

L(e)  A  {y  c  RN  |Vj<e,  1<J<N}  (8) 

Sj(e)  A  (v  c  s'*  ivjivj^.|,  liJiN}  (9) 

Tj(e)  ^  {y  e  Ivj^O  In  (6),  lii^N;  (10) 

See  Figs.  6-9.  Roughly  speaking,  P  Is  the  set  of  all 


Figure  6.  The  set  P  for  the  network  In  Fig.  5. 
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figure  7.  The  set  L(e)  for  the  network  In  Fig.  5,  when 
e*3v. 

''2 

i3v 


Figure  8.  The  set  Sj(e)  for  the  network  In  Fig.  5,  when 
ep3v. 


Figure  9.  The  set  Tj(e)  for  the  network  in  Fig.  5,  if 
both  resistors  ere  In  end  e*3v. 

vectors  of  positive  cepecitor  volteges,  L(e)  is  the  set 
of  ell  vectors  of  ceoacitor  volteges  less  then  e.  SiCe) 
consists  of  soetieily  increesino  volteges  es  one 
treverses  the  line  in  the  direction  of  the  source,  end 
Tj(e)  consists  of  tewoorelly  increesino  (non-dis- 
cherging)  volteges. 

Def.  2 

,  For  a  vector  ordinery  differentiel  equetion 
?  *  f(¥.  t),  s  c  Sjj,  a  (possibly  time-dependent)  set 
of  stetes  S(t)  c  RN  ,  positive  invariant  iff  for 
an  t]  i  0, 

x(t^)>-S(t^) ->x(t)cS(t),  for  an  t^t^. 

Roughly  speaking,  a  positive  invariant  set  is  a 
"cage"  that  traps  every  trajectory  that  enters  It. 
Positive  invariant  sets  give  us  information  about  the 
location  of  trajectories  without  requiring  that  we 
solve  the  network  differential  equations. 

Lenina  1  (151 

For  any  nonlinear,  nonuniform  RC  line,, if  e(t)  is 
continuousiy  differentiable,  e(t)  >  0,  and  e(t)  >  0  for 
t  i  0,  then  P.L(e),  Sjie)  and  TjleT  are  all  positive 
Invariant. 

From  Lemma  1  we  conclude,  for  example,  that  at 
any  instant  during  an  "up"  transition  from  equilibrium, 
''jit)  >  0,  Vj(t)  i  c,  Vj(t)  <_  Vj^^(t)  and  Vj(t)  >  0, 


1  <  j  «  N. 

The  following  partial  orderings  let  us  compare  the 
resistances  of  nonlinear  resistors  and  the  capacitances 
of  nonlinear  capacitors  [16]. 

Def.  3 

For  a  collection  of  2-termina1  nonlinear  resistors, 
with  Rj  characterized  by  i»gj(v),  we  say  Rj  ^R^  iff 
[9j{v)  -  9i((v)]  V  0.  for  all  v.  For  nonlinear 
capacitors,  with  cT  characterized  by  i  ■  C,(v)v,  we 
i^k  C)tv)  <_  C2(v)  for  each  v. 

Thus  we  compare  large-signal  (or  “chord") 
resistances,  but  incremental  capacitances.  See  Fig. 

10.  Using  Leama  1  and  Def.  3,  we  can  formulate 


Figure  10.  Resistor  R|j  is  characterized  by  i  •  gk(v). 
k  »  1,2,3.  Resistors  R^  and  R2  pass 
current  more  easily  than  R,,  i.e.  R3  is  a 
larger  resistor  than  the  other  two.  No 
such  global  comparison  between  Ri  and  R? 
is  possible. 

and  prove  [16]  the 

Honotone  Response  Theorem  for  Nonlinear. 

Nonun (form  Rd  Lines. 

Given  a  nonlinear  RC  line  as  described  above. 
Suppose  that  (because  of  circuit  parameter  uncertainty, 
the  use  of  linearized  models  for  nonlinear  elements, 
replacing  the  exact  input  by  input  bounds,  etc.,}  we 
do  one  or  more  of  the  following; 

a)  overestimate  the  input  e(t), 

b)  underestimate  one  or  more  R's, 

c)  underestimate  one  or  more  C's. 

The  resulting  circuit  model  will  then  necessarily 
overestimate  the  output  V{(t)  at  each  instant  t 
during  “up" transitions  (i.e.,  during  transitions 
where  e  ^  0,  4^0  throughout.) 

A  similar  result  holds  for  "down"  transitions  and 
estimate  errors  of  the  opposite  sion.  Using  part  a) 
of  the  assumptions,  this  theorem  allows  us  to 
computationally  propagate  upper  and  lower  signal  bounds 
through  the  network.  Using  parts  b)  and  c),  it 
allows  us  to  replace  a  nonlinear  line  by  two  linear 
ones,  one  strictly  faster  and  one  strictly  slower,  to 
which  the  linear  network  bounds  (4,5)  in  turn  apply, 
tie  have  not  yet  succeeded  in  finding  a  generalization 
of  this  result  that  will  apply  to  nonlinear  RC  trees. 

III.  An  Approach  to  Waveform  Bounding 
for  lios  Logic  Gates 

The  results  reported  here  are  due  to  Charles 
Zukowski,  and  apply  to  MOS  device  models  of  the 
form 

^D  ■  '^08’ 

where  0,G,S  and  B  refer  to  drain,  gate,  source  and 
substrate,  respectively.  For  specificity  we  consider 
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only  n'Chonnel  dtviccs  In  this  paper.  No  special 
algebraic  form  for  f  Is  assumed,  only  that  f  Is 
continuously  differentiable  and  satisfies  the  natural 
monotonicity  conditions 


?f 

5v- 


>  0,  >  0, 
"''db  “ 


dVes  ~ 


(12) 


’GB  ~  ^  'DB  “  "SB 

everywhere.  Thus  a  wide  variety  of  device  models  are 
allowed,  with  the  exception  that  (11)  does  not  allow 
for  short-channel  effects. 


Our  approach  will  be  to  reduce  a  multiple-input 
logic  gate  by  steps  to  an  "equivalent  bounding 
Inverter"  and  then  to  find  bounds  for  the  response  of 
this  Inverter. 


3.1)  Reduction  of  Series-Parallel  Transistor  Network 
to  ''Equivalent  Bounding  Transistor" 


We  have  developed  a  method  for  reducing  any  series- 
parallel  transistor  network  to  a  single  "equivalent 
bounding  transistor."  Using  the  technique  recursively, 
one  can  replace  the  pullup  or  pulldown  network  of  a 
multiple-input  gate  by  a  single  transistor  and  have 
rigorous  bounds  for  the  error  produced  by  this 
simplification. ' 


For  example,  a  parallel  connection  of  N 
transistors,  all  Identical  except  for  widths,  lengths 
and  gate  voltages,  satisfies 

^  “  ^^^B’  ''OB’  ''SB^  “ 

!!  Vi 

jj,  CJ  ^^''GBj*  ''OB’  ''SB^' 

where  vqd  is  the  vector  of  gate  voltages.  We  have 
oroven  tRat,  because  of  the  ass^tlons  (12).  there 
exist  Wgg,  Leo  independent  of  vgr,  and  vqb  and  vqr 
that  depend  on  vgg,  such  that  (13)  can  be  replaced  by 
the  simpler  bounds 


^  ''OB’  ''SB^  -  ^  - 

’'OB*  ''SB^* 

for  a11  VQg  ^  vgg,  describing  a  single  transistor 
with  a  range~of  gate  voltages.  The  function  f  is  the 
same  throughout  (13)  and  (14).  Figure  11  Illustrates 
tnis  process  for  N  »  2. 


Figure  11.  Replacing  a  parallel  transistor  network  by 
an  "equivalent  bounding  transistor".  The 
cost  of  this  simplification  Is  that  the 
exact  value  of  1  for  the  network  on  the  left 
Is  replaced  by  a  range  of  values  In  the 
simpler  model  corresponding  to  vgg  ^  vgg  i 

''gb- 

3.2)  Reducing  a  Hultlple-Input  Gate  to  an  "Equivalent 
Bounding  Inverter‘s 

A  gate  can  be  modelled  as  an  "equivalent  bounding 
Inverter"  by  performing  the  reduction  outlined  In 
section  3.1  on  both  the  pullup  and  pulldown  networks, 
reducing  each  to  a  single  transistor.  Initial  trials, 
comparing  the  result  with  SPICE  simulations  of  the 


original  network.  Indicate  that  the  resulting  bounds 
for  lout  (vout)  differ  from  the  exact  values  by  only 
about  ♦  IDS  for  practical  circuits. 

3.3)  Bounding  the  Response  of  an  Inverter  and  Load  to 
Input  Transitions' 

When  applied  to  some  multiple-input  gates,  th; 
reduction  procedure  described  In  the  previous  two 
subsections  may  yield  an  Inverter  In  which  the  pullup 
gate  Is  externally  driven.  But  for  simplicity  we 
consider  here  only  the  case  of  a  standa'rd  NHOS 
depletion  -  load  Inverter  as  In  Fig.  12. 


Figure  12.  Depletion-load  Inverter. 


To  bound  the  response  time  of  the  loaded 
Inverter  we  need  simple  bounds  on  the  function  lout 
(vqut*  v^n),  which  Is  the  difference  of  the  pullup 
and  pulldown  currents: 


'^out^'^out*  '^1n^ 


Su^W^  -  ’pd^^ouf  "in)- 


Simple  linear  bounds  on  both  the  pullup  and  pulldown 
currents  are  shown  In  Fig.  13.  The  resulting  bounds 
for  the  output  curve  1out(''out^  depend  on  v^plt). 


Figure  13.  Simple  linear  bounds  on  the  pullup  and 
pulldown  currents.  The  latter  depend  on 
v.|i,,  and  hence  on  t. 

Initial  simulations  using  this  approach  Indicate  that 
the  delay  bounds  for  these  simplified  models  differ 
from  the  delays  obtained  from  SPICE  simulations  by 
about  -t'  ISt. 


rv.  Further  Work  In  Progress 

Much  work  remains  to  be  done  before  the 
theoretical  basis  for  the  waveform  bounding  approach 
to  timing  analysis  Is  complete.  Among  the  larger 
remaining  problems  are: 

1.  extending  the  Penfleld-Rublnstein  bounds  to 
Incorporate  time-varying  source  resistances,  such  as 
those  modelling  the  pulldown  current  In  Fig.  13, 

2.  finding  bounds  for  the  response  of  an  RC  tree 
containing  pass  transistors. 

3.  Investigating  the  tolerance  In  the  bounds 
obtained  so  far  and  finding  tighter  ones  where 
necessary,  and 

4.  Incorporating  effects  of  the  Miller  capaci¬ 
tance  Into  bounds. 
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Electrical  characterization  of  ( Al,In)  As  ion  implanted  with  Be  and  Si  following  low  and  high  dose 
multiple  energy  implant  schedules,  and  annealed  between  740  and  8 1 S  *C  with  a  pyrolytic  silicon 
dioxide  cap  is  reported.  Only  very  low  activation  of  Be  is  achieved  ( <  3%).  Silicon  activation  is 
considerably  higher  ( >  40%)  and  increases  with  increasing  anneal  temperature.  However,  a  high 
concentration  R-type  surface  layer  is  found  on  samples  annealed  at  815  *C.  This  surface  layer  is 
not  found  on  similarly  aimealed  samples  which  were  not  implanted,  or  which  were  implanted 
with  phosphorus. 

PACS  numbers:  61.70.Tm,  72.80.Ey 


INTRODUCTION 

Recent  interest  in  the  III-V  heterojunction  structures 
for  high  performance  microwave'  and  optoelectronic  de¬ 
vices'  and  also  for  high  speed  transistor  logic’  has  increased 
the  need  for  lattice-matched  compound  semiconductor  ma¬ 
terials.  Various  systems,  such  as  GaAs/(Al,Ga)As  and  InP/ 
(In,Ga)(As,P),  have  been  utilized.  Another  quaternary  sys¬ 
tem  is  ( Al,Ga,In)As  which  has  a  wider  maximum  band  gap 
(1.53  eV^)  than  InP  (1.35  eV)  at  300  *K,  and  most  important¬ 
ly  can  be  grown  lattice  matched  to  (In,Ga)As  and  to  InP 
substrates  by  both  molecular  beam  epitaxy  (MBE)^  and  met¬ 
al  organic  chemical  vapor  desposition  (MOCVD)’  as  an  at¬ 
tractive  alternative  to  the  phosphorus  containing  (In, 
Ga)(As,P)  for  the  epitaxial  layers  in  heterqjunction  struc¬ 
tures.  Since  this  quaternary  contains  only  one  group  V  ele¬ 
ment,  As,  which  is  much  less  volatile  than  phosphorus,  it 
may  also  be  preferred  to  InP  because  of  its  stability  during 
high  temperature  processing.  Doping  of  the  wide  band-gap 
ternary  limit  of  this  quanemary,  Alo  4*  Ioq,  j2  As  with  silicon, 
tin,  and  beryllium  during  MBE  growth  has  been  reported.*^ 
Selective  doping  of  this  material  by  ion  implantation  or  dif¬ 
fusion  is,  however,  also  required  for  forming  r-  and  p-typc 
regions  in  device  work.  In  this  letter,  we  discuss  for  the  first 
time  the  implantation  and  annealing  of  Si  and  Be  in 
Alo.4*ltV).S2^‘ 

PROCEDURES 

Room  temperature  ion  implantation  was  done  on  1.8- 
2-/rm-thick  (Al,In)As  undoped  layers  grown  on  <100>  ori¬ 
ented  S-doped  and  as  well  as  Fe-doped  InP  substrates  by 
MBE.  The  layers  were  semi-insulating,  having  resistivity 
greater  than  10^  12  cm. 

Two  multiple  energy  Si  implant  schedules  were  used:  a 
"low”  dose  implant  for  creating  a  low  carrier  concentration 
R  layer  and  a  “high”  dose  implant  for  a  high  carrier  r  layer. 
For  Be,  a  single  two-energy  implant  schedule  was  used. 
These  schedules  are  detailed  in  Table  I. 

To  anneal  the  samples,  a  thin  layer  ( —40  nm)  of  pyroly- 
tically  grown  Si02  was  deposited  on  the  implanted  surface, 

•'Also  at  Bell  Laboratories,  Murray  Hill,  NJ  07974. 


and  the  samples  were  annealed  in  a  forming  gas  atmosphere 
for  15-20  min  at  various  temperatures  ranging  from  740  to 
815  *0. 

The  carrier  concentration  and  mobility  of  the  silicon 
implanted  layer  was  profiled  using  differential  Hall  mea¬ 
surements."*  After  each  measurement  —22-88  nm  of  the 
material  was  removed  by  using  an  etching  solution  of  H2O, 
H1SO4,  and  H2O2  (145:4:1  by  volume)  for  10-40  sec  (etch 
rate:  2.2  nm/sec)  and  then  the  measurement  was  repeated. 
C-V measurements  on  Au  Schottky  diodes  were  used  to  de¬ 
termine  the  implanted  Be  profile."  Here  also  the  repeated 
etch-and-measure  technique  was  used.  Thickness  measure¬ 
ments  on  a  control  sample  etched  simultaneously  with  the 
above  samples  were  used  to  determine  the  thickness  etched 
off  and  flatness  of  the  surface. 

EXPERIMENTAL  RESULTS 

The  measured  doping  profiles  of  the  silicon  implants 
are  shown  in  Fig.  1.  The  two  dotted  curves  are  the  silicon 
profiles  predicted  by  Lindhard-Scharff-Schiott  (LSS)  the¬ 
ory.'**  The  upper  dotted  curve  represents  the  LSS  profile  for 
the  high  dose  silicon  implant  schedule.  The  upper  three  solid 
curves  passing  through  the  experimental  points  are  the  mea¬ 
sured  doping  profiles  for  the  high  dose  schedule  annealed  at 
three  different  temperatures,  740,  760,  and  815  *C,  for  15, 
20,  and  20  min,  respectively.  The  lower  solid  curve  presents 

TABLE  I.  Silicon  uid  beryllium  multiple  energy  implant  schedules  used  to 
dope  (Al.  In)As  n  and  p  type.  All  implants  were  performed  with  the  sub¬ 
strates  nominally  at  room  temperature.  The  anneal  procedures  used  are  de¬ 
tailed  in  the  text 


Schedule 

Energy 

(keV) 

Dose 

(XlO-'^cm-*) 

Silicon  low  dose 

100 

1.06 

230 

2.66 

Silicon  high  dose 

100 

5.60 

250 

14.0 

Beryllium 

50 

3.35 

120 

9.18 
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FIG.  1.  Dopiiif  proSkt  of  sUioon  impi«tit#rf  into  (Ai4n)As.  The  dotted 
curves  indiotte  the  ptoflles  expected  on  the  taeiis  of  LSS  theory  for  the  high 
and  low  dose  schedules  presented  in  Table  1.  The  solid  curves  passing 
through  the  dau  points  are  the  experimentally  measured  prodles  for  sam¬ 
ples  implanted  with  the  high  dose  schedule  annealed  at  740  *C  for  IS  min 
(■i.  760  ‘C  for  20  min  (#1.  and  at  81S  *C  for  20  min  (A),  and  srith  the  low 
dose  schedule  and  annealed  at  7S0  *Cfor20niin  (^).  A  pyrolytically  depos¬ 
ited  SiO:  cap  was  used  during  all  anneals. 


Depth  (yarn) 

FIG.  3.  Deling  profiles  of  beryllium  implanted  into  (Al.la|As.  The  dotted 
curve  is  the  proUe  predicted  by  LSS  theory  and  the  solid  and  dashed  curve 
is  drasvn  through  the  experimental  data.  Measurements  on  two  samples, 
one  annealed  IS  min  at  740  ‘C  (•|,  and  the  other  annealed  20  min  at  SIS  *C 
(A),  are  presented. 


the  measured  doping  profile  for  the  low  dose  schedule  an¬ 
nealed  at  7S0  *C  for  20  min.  The  mobility  profiles  corre¬ 
sponding  to  the  samples  and  data  in  Fig.  1  are  presented  in 
Fig.  2. 


Depfh(;Ani; 


FIG.  2  Room  temperature  electron  Hall  mobility  profiles  corresponding  to 
the  silicon  doping  profiles  presented  in  Fig.  1.  The  notation  used  in  this 
figure  is  the  same  as  in  Fig.  1. 


The  doping  profiles  measured  by  C-  V  measurements  on 
the  Be-implanted  samples  are  presented  in  Fig.  3  along  with 
the  profile  predicted  by  LSS  theory  (dotted  line).  Three  data 
points  for  a  sample  annealed  IS  min  at  740  *C  and  one  for  a 
sample  annealed  20  min  at  815  *C  are  shown,  along  with  a 
solid  curve  indicating  the  corresponding  doping  profile.  A 
comparison  of  the  areas  under  the  measured  profile  and  the 
LSS  profile  indicate  an  activation  efficiency  of  approximate¬ 
ly  3%. 

As  a  result  of  the  observation  of  the  high  carrier  concen¬ 
tration  near  the  surface  of  the  Si-implanted  samples  an¬ 
nealed  at  815  *C  (see  Fig.  1),  additional  annealing  experi¬ 
ments  were  performed  on  unimplanted  and  phosphorous 
implanted  samples.  The  results  of  these  studies  will  be  pre¬ 
sented  and  discussed  in  the  Discussion  section  which  fol¬ 
lows. 

DISCUSSION 

Looking  first  at  Fig.  1,  the  Si  implant  profiles,  and  spe¬ 
cifically  the  measured  curves,  it  is  seen  that  as  the  annealing 
temperature  is  increased,  the  profiles  fall  off  more  sharply, 
e.g.,  the  tail  of  the  implant  profile  annealed  at  740  *C  extends 
furthest  into  the  material,  whereas  the  one  annealed  at 
815  *C  does  the  least.  This  observation  is  consistent  with  si¬ 
milar  phenomenon  in  (In,Ga)As  reported  elsewher-:. 

The  experimental  profiles  for  the  higher  dose  schedule 
close  to  the  surface  indicate  a  very  high  carrier  concentration 
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near  the  surface,  similar  to  the  conducting  layer  which  has 
also  been  reported  by  Davies  et  al.  in  the  case  of  furnace 
annealed  silicon  implanted  InP.'^  This  phenomenon  artifi- 
cally  increases  the  apparent  activation  efficiency  to  more 
than  100%.  In  our  case,  the  higher  dose  implant  annealed  at 
815  *C  has  an  apparent  activation  of  about  400%  while  the 
samples  annealed  at  760  and  740  *C  show  97%  and  47% 
activation,  respectively.  The  high  activation  efficiency  in  the 
fint  case  is  due  to  the  surface  layer. 

To  better  understand  the  presence  of  this  high  concen¬ 
tration  surface  layer,  an  unimplanted  sample  of  (AJ,In)As 
coated  with  400  A  of  SiO^  was  annealed  successively  at  750, 
785,  and  8 1 5  *C  for  periods  of  20  min  and  the  sheet  resistivity 
was  measured  after  the  anneal  at  each  temperature.  The 
sheet  resistance  was  too  high  to  be  measured  after  the  20-min 
anneal  at  750  *C.  After  the  additional  20  min  at  785  *C,  the 
sheet  resistance  was  4.5  X 10^  /2  per  square,  and  after  an¬ 
other  20  min  at  815  *C  the  sheet  resistance  was  6.2  X 10*  /2 
per  square.  For  comparison,  the  sheet  resistance  of  a  sample 
implanted  following  the  high  dose  Si  schedule  and  annealed 
at  8 1 5  *C  was  measured  to  be  372  /2  per  square.  This  is  s:200 
times  lower  than  the  unimplanted  sample  annealed  at 
8 1 5  *C,  and  is  ~5  times  lower  than  the  value  of  1 500-2000  fl 
per  square  that  would  be  expected  if  the  conductivity  was 
due  solely  to  full  activation  of  implanted  Si.  There  is  clearly  a 
significant  difference  in  the  surface  layer  that  forms  on  sam¬ 
ples  that  have  been  implanted  with  silicon  and  on  unim- 
planted  samples. 

To  examine  the  role  of  the  implantation  damage  in  the 
creation  of  the  high  concentration  surface  layer,  the  same 
(Al,In)As  sample  was  implanted  with  phosphorus  following 
the  high  dose  schedule.  Phosphorus  was  chosen  because  it  is 
adjacent  to  Si  in  the  periodic  table,  and  should  be  electroni¬ 
cally  relatively  inert  in  (Al,In)As.  A  pyrolytic  SiO-  layer  (40 
nm  thick)  was  deposited  on  the  surface  and  the  sample  was 
annealed  20  min  at  815  *C.  After  the  SiOj  was  removed,  the 
sheet  resistance  was  measured  to  be  2x  10*  /2  per  square. 
This  is  lower  than  before  by  approximately  3  times  but  still 
much  higher  than  for  the  Si-implanted  samples. 

It  should  be  noted  that  a  high  concentration  n-type  sur¬ 
face  layer  was  not  observed  when  the  Be-implanted  sample 
was  annealed  at  815  *C  either.  Thus,  we  can  conclude  that 
the  damage  caused  to  the  surface  during  ion  implantation  is 
in  itself  not  sufficient  to  result  in  a  high  concentration  sur¬ 
face  layer  after  ah  815  *C  anneal.  It  seems  either  that  the 
presence  of  Si  contributes  to  the  surface  change,  or  that  the 
presence  of  P  and/or  Be  aids  in  stabilizing  the  surface. 

While  the  high  dose  Si  implants  all  show  a  high  surface 
concentration,  there  is  no  evidence  of  a  high  concentration 
surface  layer  on  the  sample  implanted  following  the  low  dose 
schedule.  The  activation  efficiency  for  this  lower  dose  im¬ 
plant  is  only  23%.  Although  this  activation  is  lower  than  the 
comparable  activation  of  silicoj;  implants  in  InP  or  GaAs, 
this  may  be  related  to  the  .hirh:r  band  gap  of  this  material. 
Annealing  at  higher  temperatures  or  implanting  with  the 
substrates  heated,  may  consequently  improve  the  activation 
efficiency. 

Looking  now  at  Fig.  2,  the  mobility  data,  if  this  figure  is 
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viewed  along  with  Fig.  1,  it  is  noted  that  the  profiles  having 
the  lowest  carrier  concentration  have  the  highest  mpbility 
and  vice  versa,  with  the  exception  of  the  high  dose  sample 
annealed  at  740  *C.  This  sample  has  the  highest  mobility 
although  the  carrier  concentration  of  the  low  dose  sample  is 
lower.  This  diiTerence  may  simply  reflect  variations  in  the 
initial  epitaxial  layers.  The  mobility  values  at  different  con¬ 
centration  levels  are  in  very  good  agreement  with  the  report¬ 
ed  mobility  of  tin’  and  silicon®  doped  MBE-grown 
(Al,In)As.  The  lower  values  of  the  mobility  deeper  into  the 
material  viewed  along  with  the  lower  implant  activation  in 
that  region  is  felt  to  be  indicative  of  the  fact  that  significant 
damage  still  remains  after  annealing. 

Turning  now  to  the  Be  implant  data.  Fig.  3,  the  activa¬ 
tion  was  found  to  be  only  2.7%.  The  depth  of  the  profile  is 
also  very  shallow.  Annealing  at  a  higher  temperature 
(815  *C)  did  not  help  change  the  sheet  resistance  or  concen¬ 
tration  appreciably.  It  is  reported  that  for  (In,Ga)As,'‘ 
InP,'^  and  GaAs  (Ref.  14)  the  activation  of  Be  is  always 
much  less  than  that  of  Si.  Possibly  a  similar  phenomenon  is 
occurring  here.  (Al,In)As,  having  higher  band  gap  than  any 
of  those  materials,  would  be  expected  to  require  still  higher 
anneal  tempertures  and  show  even  lower  activation. 

CONCLUSIONS  AND  SUMMARY 

This  work  has  demonstrated  that  it  is  possible  to  dope 
|Al,In)As  n-  and  p-type  with  ion  implantation.  The  general 
behavior  of  ion  implanted  silicon  and  beryllium  is  in  many 
ways  similar  in  this  material  to  what  it  is  in  related  III-V 
compounds,  but  as  with  other  III-V’s,  the  need  for  addi¬ 
tional  research  on  ion  implantation  and  annealing  technol¬ 
ogies  is  also  clear. 

Activation  in  the  case  of  Be  implantation  is  very  low, 
approximately  3%,  and  the  activation  of  Si  is  also  low, 
cs:40%,  although  it  is  much  higher  than  that  of  Be.  Increas¬ 
ing  the  post-implant  anneal  temperature  from  740  to  815  *C 
did  not  increase  the  Be  activation  but  did  increase  slightly 
the  Si  activation  and  the  associated  carrier  mobilities. 

The  most  significant  result  of  the  higher  anneal  tem¬ 
perature  was  the  appearance  of  a  high  concentration  surface 
layer  on  the  Si-implanted  samples.  This  effect  was  not  ob¬ 
served  when  unimplanted  samples,  and  samples  implanted 
with  phosphorus  (at  the  same  doses  and  energies)  and  berylli¬ 
um  (similar  dose  but  lower  energies),  were  similarly  an¬ 
nealed.  Experiments  directed  at  determining  the  reasons  for 
this  different  behavior  and  thereby  ways  of  stabilizing  the 
surface,  are  plaiuied.  If,  for  example,  implanting  P  does  sta¬ 
bilize  the  surface,  then  no  high  concentration  layer  should  be 
found  when  Si  and  P  are  implanted  together.  If  it  is  Si  itself 
which  causes  the  problem,  then  a  high  concentration  layer 
would  be  expected  even  when  P  and  Si  are  both  implanted. 
To  help  further  determine  the  roles  played  by  damage  and 
the  various  ionic  species  in  causing  or  preventing  surface 
changes  during  the  anneal,  proton  bombardment  could  also 
be  used  to  damage  the  surface.  The  effect  of  reducing  the 
implant  energies  must  also  be  considered. 

The  observation  of  surface  changes  is  imponant  to  the 
more  general  issue  of  increasing  the  activation  efficiency  of 
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any  species  because  the  present  results  indicate  that  still 
higher  anneal  temperatures  should  be  considered.  This,  and 
the  use  of  substrate  heating  during  implanution  would  ap¬ 
pear  to  be  logical  steps  to  take  in  attempting  to  increase  the 
activation  and  mobilities,  reduce  the  sheet  resistances,  and 
remove  implant  damage. 
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A  100-kW  water-walled  dc  argon  arc  lamp  has  been  used  for  the  first  time  to  post  anneal  ion- 
implanted  InP  samples.  Temperatures  as  high  as  92S  *C  and  short  cycle  times  (3  and  10  s)  are  used 
for  the  process.  Se  and  Be  were  ion  implanted  into  room  temperature  and  hot  substrate  InP 
samples.  A  sputter  deposited  SiOi  layer,  120  mm  thick,  covering  all  wafer  surfaces  was  used  as  an 
encapsulant  during  the  lamp  anneal.  Hot  substrate  Se  implants  (400  keV,  1.8x  i0'*cm~^200*C) 
show  an  average  mobility  of  141S  cm^/Vs  and  an  activation  of  ~  63%,  and  room-temperature  Be 
implants  (50  keV,  3.35X  10'*  cm~*  and  150  keV,  5.74X  10'*  cm~*)  an  average  mobility  of  88 
cm*/Vs  and  activation  of  ->45%.  This  annealing  technique  is  straightforward  and  gives 
activations  and  mobilities  comparable  to,  or  better  than,  the  best  furnace  anneals  with  sharp 
profiles  and  simplified  surface  encapsulation. 

PACS  numbers:  61.70.Tm,  72.80.Ey,  81.40.Ef,  81.40.Rs 


The  development  of  ion  implantation  technology  for 
InP  is  very  important  because  of  its  potential  use  in  the  fabri¬ 
cation  of  optoelectronic  and  microwave  devices.  Post-im¬ 
plant  annealing  of  the  implanted  material  to  remove  the  un¬ 
wanted  damage  introduced  during  ion  implantation  is 
perhaps  the  most  important  step  in  the  entire  process.  An¬ 
nealing  of  ion-implanted  InP  has  previously  been  reported 
using  a  conventional  furnace,'**  pulsed  laser  beams,*  pulsed 
electron  beams,'*  and  a  graphite  strip  heater.*  In  this  letter, 
we  present  for  the  first  time  the  mr^ility  and  carrier  concen¬ 
tration  profiles  for  room-temperature  and  hot  (200  *C)  Se 
and  Be  implants  in  InP  annealed  using  an  ultrahigh  power 
water-walled  dc  argon  arc  lamp  for  very  short  periods:  3  and 
10  s. 

The  InP  samples  used  in  this  work  were  cut  to  1 X 1  cm 
from  ( 100)  oriented  iron  doped,  semi-insulating  wafers.  The 
surface  was  uncoated  and  was  polished  chemically  prior  to 
the  implantation.  For  implantation  into  the  heated  InP,  the 
sample  was  mounted  on  a  copper  plate  with  gallium.  This 
plate  holding  the  sample  was  heated  by  a  carbon  strip  heater, 
a  thermocouple  was  used  to  monitor  the  temperature.  A 
dose  of  1.8X  10'*  cm“*  at  400  keV  of  energy  was  used  for  Se 
implants.  For  Be,  multiple  energy  implants  of  50  and  150 
keV  with  doses  of  3.35  x  10'*  and  5.74x  10'*  cm"*,  respec¬ 
tively,  were  used.  120  nm  of  SiOj  was  sputter  deposited  as  a 
cap  on  both  surfaces  of  the  samples  after  implantation  and 
prior  to  the  lamp  aimealing. 

The  argon  arc  lamp  used  in  the  aimealing  system  con¬ 
sists  of  a  single  quartz  tube  into  which  cold  water  is  injected 
so  that  it  spirals  along  the  inside  wall  of  the  tube  forming  an 
inner  water  wall.  The  argon  gas  flows  down  the  center  of  this 
water-walled  tube.  As  much  as  100  kW  of  power  can  be 
applied  to  the  arc,  which  is  1  cm  in  diameter  and  20  cm  long. 
In  addition  to  directly  cooling  the  arc,  the  water  prevents 
any  material  sputtered  from  the  electrodes  from  depositing 
on  the  walls  of  the  quartz  tube.  The  samples  are  mounted  in 


front  of  the  tube  and  its  reflector  system  on  small  quaitz 
pins.  The  temperature  is  measured  by  an  optical  pyrometer 
viewing  the  back  of  the  sample.  A  more  detailed  description 
of  the  arc  lamp  annealing  system  is  published  elsewhere.* 
Anneal  cycles  of  3  and  lO-s  duration,  like  those  shown 
in  Hg.  1  were  used  and  peak  temperatures  of 900  and  925  *C, 
respectively,  were  obtained  as  illustrated.  The  sample  tem¬ 
perature  is  essentially  uniform  throughout  its  bulk  during 
the  anneal  cycle.  The  rise  and  fall  times  of  the  temperature 
transient  are  determined  by  the  sample  thickness,  whereas 
the  peak  temperature  is  determined  by  the  lamp  current,  the 
thickness  of  the  Si02  coating,  and,  in  the  case  of  the  3-s  an¬ 
neal  where  the  system  does  not  reach  thermal  steady  state, 
the  sample  thickness.  No  change  in  either  the  thickness,  etch 

llOOi  I  I  I  I  I 

.  M 

Thekness ;  XOfim 
~  Onde:  120nm 


0  2  4  6  8  10  12  14 
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FIG.  I.Time-lemperaturecyclesforthe  InPsamplesdunngthe3-and  lO-s 
arc-lamp  anneal  sequences.  The  lemperature  is  the  reading  of  an  optical 
pyrometer  viewing  the  back  of  the  wafer. 
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TABLE  1.  Summary  of  Hall  charactenzatioii  of  arc-lamp  annealed  seleni¬ 
um  implanted  InP  samples. 


Substrate  during  implant 

Selenium  Cold(RT)  Hot  (200  ^Cl 


Anneal  cycle 

3s 

185/2/0 

52/2/0 

900 'C  peak 

ii  =  837  em'/Vs 

ii  -  1373  em’/Va 

22.5% 

48% 

10  a 

77/2/0 

40/2/0 

925*0 

/i  «  996  em’/Va 

K  1415  cm^/Va 

45% 

61% 

rate,  or  the  refractive  index  of  the  SiOj  film,  nor  in  the  ap¬ 
pearance  of  the  InP  surface  could  be  detected  after  the  an¬ 
neals. 

Two  sets  of  samples,  each  containing  four  InP  pieces 
were  implanted  with  Se  and  Be,  respectively,  following  the 
schedules  mentioned  above.  Two  samples  in  each  set  were 
implanted  at  room  temperature  and  the  other  two  at  200  *C. 
Each  set  was  then  further  divided  and  half  were  arc  lamp 
annealed  for  3  s  and  half  for  10  s. 

After  annealing,  the  oxide  was  etched  off  and  Van  der 
Pauw  patterns^  were  mesa  etched  on  the  material  using  con¬ 
ventional  photolithography.  For/i-type  material  the  contact 
pads  were  plated  with  indium  and  gold,  and  for^type  mate¬ 
rial  they  were  plated  with  Zn  and  Au.'  The  contacts  were 
sintered  at  420  *C  for  45  s  in  a  forming  gas  atmosphere. 

The  samples  were  all  characterized  using  room-tem¬ 
perature  Hall  measurements.  Tables  I  and  II  summarize  the 
results  in  terms  of  sheet  resistance,  average  sheet  mobility, 
and  total  percent  activation  achieved  in  each  case  for  the  ^ 
and  Be  implants,  respectively. 

Referring  first  to  the  selenium  implants  (Table  I),  the 
hot  substrate  implants  give  consistently  better  results — low¬ 
er  sheet  resistance,  higher  mobilities,  and  higher  activa¬ 
tion — than  the  room-temperature  implants,  and  the  10-s  an¬ 
neal,  which  reaches  the  highest  temperature  at  925  *C  is 
superior  to  the  3-s  anneal.  Comparing  our  lamp  annealed 
results  with  the  best  known  furnace  annealed  results’  for  the 
same  dose  of  Se,  we  see  that  the  activation  for  both  room 
temperature  and  hot  substrate  implants  b  comparable,  and 
the  mobilities  are  sUghtly  higher.  Multiply  scanned  electron 
beam  annealed  Se  implanted  InP  shows  much  lower  activa- 


TABLE  11.  Sumnutry  of  Hall  characterizatioa  of  are-lamp  annealed  bery- 
Hum  implanted  InP  samples.  The  hot  substrate  implants  were  loo  low  ooo- 
eentration  for  satisfactory  measurement 


Berylium 

Subatnte  during  implant 

Cold(RT)  Hot  (200 ’q 

Anneal  cycle 

3a 

1990/2  /0 

900  *C  peak 

/i  >■  81  em’/Va 

42.5%  ■ 

lOa 

1665/2/0 

925 ‘C  peak 

/i  —  88  cm*/Vi 

47% 

tion  (~36%)  and  mobility'*’  than  we  see  with  lamp  anneal¬ 
ing.  It  has  been  shown  in  the  literature  that  heavy  ions  like  Se 
when  implanted  into  InP  or  other  III-V  compounds  at  room 
temperature  tend  to  amotphize  the  material  and  create 
much  damage.  Implanting  into  a  hot  substrate  helps  prevent 
amorphization  and,  thus,  less  damage  is  created  and  better 
electrical  properties  are  achieved."  Our  results  with  lamp 
annealing  are  consistent  with  these  observations. 

One  each  of  the  hot  substrate  and  room-temperature  Se 
implanted  samples  annealed  at  925  *C  for  10  s  was  depth 
profiled  for  carrier  concentration  and  mobility  using  differ¬ 
ential  Hall  measurements.  The  results  are  shown  in  Fig.  2. 
The  peak  carrier  concentration  reaches  the  same  level, 
—  lx  10”  cm~^,  as  predicted  by  the  Lindhard-Scharff- 
Schiott  (LSS)  theory,  and  there  is  no  evidence  for  any  dopant 
diffusion  either  into  or  out  of  the  material.  In  contrast,  simi¬ 
larly  implanted,  furnace  annealed  samples  do  not  reach  the 
peak  carrier  concentration  predicted  by  the  LSS  theory  and 
show  an  appreciable  amount  of  in  and  out  diffusion.’ 

The  berylium  implants  into  heated  substrates  showed 
very  low  concentration  and  were  not  studied  further.  This 
behavior  appears  to  be  similar  to  that  seen  in  furnace  an¬ 
nealed  hot  &  implants.  The  results  on  the  room-tempera¬ 
ture  Be  implants  were  much  better  and  the  activation  and 
mobility  are  similar  to  what  is  obtained  from  a  furnace  an¬ 
neal  for  similar  doses  and  energies.'^  As  with  Se,  the  10-s 
anneal  to  925  *C  was  superior  to  the  slightly  lower  tempera¬ 
ture  3-s  anneal,  but  the  difference  between  the  two  cycles  is 
much  less. 

Both  room-temperature  Be  implants  were  profiled  us- 


FIG.  2.  Carrier  concentration  and  mobility  profiles  measured  by  the  differ¬ 
ential  Hall  technique  on  two  Se  implanted  samples;  one  implanted  at  room 
temperature  and  one  at  2CX)  *C,  both  arc-lamp  annealed  for  10  $  to  a  peak 
temperature  of  925  ’C. 


382 


Appl.  Ptiys.  Lett  .Vol.  43.  No.  4, 15  August  1983 


Masum  Choudhury  etal 


382 


10 


19 


£ 

u 

c 

o 


10 


IT 


c 

V 

u 

c 

o 

o 


10' 


,15 


10 


10" 


— ■ — 1 — ' — 1 — ' — I — ' — 1 — ' — 1 — ^ — 1 — ' — 

.InP; HIGH  DOSE  Be, ARC  LAMP  ANNEAL 

Imolont 

>  V*..,  \ 

Oose: 

SOkeV,  3.35«l0'*cm^ 
ISO  lieV.5.75«l0'*cm*^ 

*  \ 

R  T.  Substrote 

Conctntrotton  \ 

(-^Scoie)  \ 

Post  Imoiont  Anntol 

tss  ' 

Theory 

Cycle: 

o  10  sec  .923  *C  Peek 
44%  Aeiivoiion 
*  3se<,900*C  Peok 
43%  Activation 

Mobility 
(Seoie  — ) 

Encapsulation 

120  nm  SiOz 

■f*  ^ 

A 

I  .  1  .  1  . 

0.2  0.4  06 

08  1.0  1.2  1. 

10* 

o 

4> 
lA 


.0^  § 


d 

o 


10 

4 


Depth  (/i.m) 


FIG.  3.  Carrier  concentration  and  mobility  profiles  measured  by  the  difier- 
ential  Hall  technique  on  two  Be-implanted  samples:  one  arc  lamp  annealed 
following  the  3^  cycle,  the  other  annealed  following  the  10-s  cycle. 


ing  differential  Hall  measurements  and  the  results  are  illus¬ 
trated  in  Fig.  3.  For  furnace  annealed  samples  appreciable 
diffusion  of  Be  into  the  material  is  observed'^;  no  such  diffu¬ 
sion  is  found  in  the  arc  lamp  annealed  samples,  as  can  be  seen 
in  Fig.  3.  The  activations  for  both  annealing  cycles  are  simi¬ 
lar,  but  the  mobility  is  slightly  better  for  the  higher  tempera¬ 
ture  cycle. 

In  conclusion,  the  activation  and  carrier  mobilities 
achieved  in  InP  from  Be  and  Se  implantation  using  a  high 
power  arc  lamp  for  the  post-implant  anneal  are  comparable 
to  and  sometimes  better  than  those  obtained  from  10-lS- 


min,  7S0  *C  furnace  anneals.  At  the  same  time,  sharp  profiles 
showing  no  dopant  diffusion  are  obtained.  Another  signifi¬ 
cant  feature  of  the  lamp  anneal  technique  is  the  relative  ease 
with  which  the  surface  can  be  encapsulated  and  protected 
during  a  post-implant  anneal.  Compared  to  furnace  anneal¬ 
ing  techniques,  the  arc  lamp  technique  is  straightforward, 
fast,  and  yields  superior  results.  Work  is  presently  in  pro¬ 
gress  extending  these  studies  to  other  dopants  and  still  high¬ 
er  anneal  temperatures. 
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Electron-beam-induced  current  measurements  in  silicon-on-insuiator  films 
prepared  by  zone-melting  recrystallization 
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Enhanced  diffusion  of  arsenic  along  grain  boundaries  and  subboundanes  in  zone-recrystallized 
silicon-on-insulator  films  has  been  measured  by  electron-beam-induced  current  analysis  of  lateral 
pn  junctions  fabricated  in  the  films.  A  four-hour  diffusion  at  1 100  °C  resulted  in  protrusions  of 
arsenic  at  the  junction  edges  which  measured  approximately  3-5  pm  along  the  grain  boundaries 
and  only  1-2 /rm  along  the  subboundaries.  The  results  suggest  that  under  more  ordinary  thermal 
processing  conditions,  field-effect  transistors  with  channel  lengths  greater  than  about  1.5  pm  can 
be  randomly  positioned  with  respect  to  the  more  numerous  subboundaries,  but  grain  boundaries 
should  be  avoided. 

PACS  numbers:  73.40.Lq,  61.70.Ng,  66.30.Jt,  81.10.-  h 


Zone-melting  recrystallization  by  means  of  a  movable- 
strip  heat  source  can  produce  continuous  device-worthy  sili¬ 
con  films  over  thermally  oxidized  silicon  when  a  suitable 
encapsulation  layer  is  present  to  prevent  silicon  agglomera¬ 
tion.  The  films  consist  of  large  grains,  typically  1  mm  wide 
and  extending  the  length  of  the  zone  scan,  which  are  seeded 
from  a  transition  region  where  solidification  begins.^"*  Indi¬ 
vidual  grains  contain  subboundaries,  typically  25  pm  apart, 
which  are  generally  parallel  to  one  another  and  to  the  direc¬ 
tion  of  motion  of  the  molten  zone.  The  subboundaries  con¬ 
sist  of  linear  arrays  of  dislocations  with  angular  deviations  of 
the  order  of  one  degree  or  less;  they  originate  at  the  interior 
comer  of  the  solid-liquid  interface  during  recrystalliza¬ 
tion.’*  Large-angle  grain  boundaries  can  be  eliminated  by 
seeding  from  the  silicon  substrate  beneath  the  silicon  dioxide 
layer’  *  or  by  appropriate  patterning  of  the  silicon  film  prior 
to  r.^crystallization.’  The  subboundaries  are  more  difficult  to 
control;  however,  entrainment  techniques  which  force  the 
subboundaries  to  be  positioned  at  specific  regular  intervals 
have  been  reported.* 

In  a  previous  letter,'*  the  electrical  characteristics  of 
both  grain  boundaries  and  subboundaries  were  examined  by 
fabricating  large  (92  X  315  pm),  phosphorus-doped  ( 1 X 10” 
cm~’)  resistors  which  were  either  parallel  or  perpendicular 
to  the  line  defects.  One  or  more  transversely  oriented  giain 
boundaries  provided  significant  (~  15%)  bulk  conductivity 
degradation;  whereas,  an  average  of  20  subboundaries  pro¬ 
vided  only  marginal  (—0.15%)  bulk  conductivity  degrada¬ 
tion.  The  surface  conductivity  at  the  interface  between  the 
silicon  film  which  contained  the  resistors  and  the  silicon 
dioxide  below  could  be  modulated  by  using  the  silicon  sub¬ 
strate  as  a  gate  electrode.  Transversely  oriented  grain  boun- 
danes  induced  peculiar  “kinks"  in  the  tum-on  characteris¬ 
tics  of  these  "upside-down”  depletion-mode  field-effect 
transistors.  The  influence  of  subboundaries  on  surface  con¬ 
ductivity  could  not  be  detected.  These  results  suggest  that 
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grain  boundaries  must  be  avoided  when  considering  the 
placement  of  field-effect  transistors  on  zone-recrystallized 
silicon-on-insulator  films  and  that  the  degradation  of  electri¬ 
cal  properties  due  to  subboundaries  can  be  ignored. 

As  the  dimensions  of  field-effect  transistors  fabricated 
on  zone-recrystallized  silicon-on-insulator  films  are  re¬ 
duced,  another  potential  problem  arises,  i.e.,  the  enhanced 
diffusion  of  dopant  impurities  during  high-temperature  pro¬ 
cessing  along  line  defects  which  form  a  connective  path 
between  source  and  drain.  This  phenomenon  shortens  the 
effective  channel  length  of  the  transistor  and  ultimately 
leads  to  an  abrupt  increase  in  subthreshold  leakage  current. 
Enhanced  diffusion  of  arsenic  along  grain  boundaries  in  la- 
ser-rerTtstallized  silicon-on-insulator  films  was  shown  to  be 
significant  when  channel  lengths  are  less  than  3  pm,  given  a 
source-drain  anneal  at  900  *C  for  90  min.  ‘ ' 

In  an  earlier  study,”  enhanced  diffusion  of  arsenic 
along  grain  boundaries  in  the  vicinity  of  lateral junctions 
fabricated  in  laser-recrystallized  silicon  films  was  measured 
by  electron-beam-induced  current  (EBIC)  analysis,  and  the 
results  were  compared  with  scanning-electron-beam  vol¬ 
tage-induced  contrast  as  a  check  for  measurement  consisten¬ 
cy.  A  five-hour  arsenic  diffusion  at  1000  ’C  resulted  in  pro- 
tiusions  of  arsenic  at  the  junction  edges  extending  up  to  5  //m 
along  the  grain  boundaries.  The  different  lengths  of  the  pro¬ 
trusions  suggested  significant  variations  in  the  microstruc¬ 
tures  or  diffusivities  of  the  grain  boundaries. 

This  letter  presents  the  results  of  EBIC  analysis  of  la¬ 
teral  pn  junctions  fabricated  in  zone-recrystallized  silicon- 
on-insulator  films.  In  recognition  of  the  electrical  and  crys¬ 
tallographic  differences  between  grain  boundaries  and 
subboundaries,  particular  effort  has  been  made  to  compare 
the  enhanced  diffusion  along  the  two  types  of  line  defect. 
Earlier  work  suggested  a  difference  in  diffusion  characteris¬ 
tics,”  but  the  meai  irements  were  by  a  less  accurate  proce¬ 
dure  in  w  hich  the  arsenic-nch  regions  of  the  lateral  pn  junc¬ 
tions  were  preferentially  etched.  If  minimal  diffusion  along 
subboundaries  could  be  demonstrated,  it  would  further  jus¬ 
tify  the  random  placement  of  devices  with  respect  to  the 
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subboundaries,  thereby  simplifying  the  processing  require¬ 
ments  associated  with  zone-recrystallized  silicon-on-insula¬ 
tor  films. 

Prior  to  recrystallization,  samples  were  prepared  by 
oxidizing  (100)  />-type  silicon  wafers  at  1 100  *C  to  obtain  a 
silicon  dioxide  thickness  of  0.5  fim.  The  thermal  oxide  was 
sequentially  coated  with  a  0.5-/rm-thick  layer  of  low  pres¬ 
sure  chemical  vapor  deposition  (LPCVD)  polycrystalline  si¬ 
licon,  a  2.0-^m-thick  layer  of  CVD  silicon  dioxide,  and  a 
0.03-/im-thick  layer  of  sputtered  silicon  nitride.  The  details 
of  the  zone-melting  recrystallization  procedure  have  been 
reported  elsewhere.'"*  Briefly,  the  sample  was  placed  within 
an  argon  ambient  on  a  graphite  strip  which  was  resistively 
heated  to  1000  *C.  A  movable  upper  strip,  approximately  1 
mm  above  the  sample  and  1  mm  wide,  was  resistively  heated 
until  the  polycrystalline  silicon  film  melted,  and  the  molten 
zone  was  moved  across  the  sample  at  a  speed  of  approxi¬ 
mately  1  mm/s.  After  recrystallization,  the  silicon  dioxide 
and  silicon  nitride  encapsulation  layers  were  removed  in 
concentrated  hydrofluoric  acid.  The  silicon  film  was  im¬ 
planted  with  a  fluence  of  2  X  10‘‘  cm~^  boron  at  70  keV.  The 
boron  was  uniformly  redistributed  throughout  the  silicon 
film  at  a  later  point  in  the  process  sequence  to  yield  a  concen¬ 
tration  of  4X 10'*  cm”*. 

In  order  to  distinguish  the  subboundaries  from  the 
grain  boundaries,  the  films  were  decorated  by  a  regular  ma¬ 
trix  of  anisotropically  etched  pits.  These  were  prepared  by 
depositing  a  0.4-/im-thick  layer  of  CVD  silicon  dioxide  over 
the  silicon  film  and  then  etching  5-/tm-diam  holes  on  50-/Ltm 
center  spacing.  The  exposed  silicon  was  etched  in  a  potas¬ 
sium  hydroxide  solution  until  square  pits  revealed  the  local 
crysullographic  orienutions  throughout  the  film.  As  in  oth¬ 
er  experiments,'"*  the  texture  of  the  film  was  (100). 

The  fabrication  of  lateral  pn  diodes  began  with  the 
opening  of  large  (up  to  2  X  2  mm)  holes  in  the  silicon  dioxide 
which  had  been  used  to  define  the  matrix  of  etch  pits.  The 
holes  defined  the  diode  emitter  (n*)  regions  and  were  im¬ 
planted  with  a  fluence  of  1 X  10'*  cm”*  arsenic  at  150  keV. 
The  implant  was  annealed  at  1 100  *C  for  10  min  in  dry  oxy¬ 
gen  to  form  a  thin  silicon  dioxide  cap  and  then  for  4  h  in  dry 
nitrogen  to  exacerbate  the  degree  of  enhanced  diffusion 
which  was  expected  along  the  grain  boundaries.  The  silicon 
dioxide  over  the  diode  emitter  regions  was  removed  follow¬ 
ing  the  anneal.  Aluminum  was  deposited  over  the  sample 
and  then  defined  to  form  contact  pads  which  were  slightly 
recessed  from  the  junction  edges.  The  fabrication  process 
concluded  by  etching  the  silicon  dioxide  which  remained 
over  the  silicon  film. 

After  processing,  the  samples  were  mounted  on  headers 
and  aluminum  wires  were  bonded  to  the  diode  n*  contact 
pads.  There  was  no  direct  electrical  contact  available  to  the 
diode  p  regions  (field);  the  diodes  were  connected  back-to- 
back  in  pairs  without  an  applied  bias  voltage.  A  JSM-50 
(Japanese  Electron  Optics  Limited)  scanning  electron  micro¬ 
scope  was  used  for  :hc  EBIC  analysis.  With  a  5-keV  electron 
beam,  the  EBIC  resolution  was  presumed  to  be  consistent 
with  that  established  in  Ref.  12,  approximately  0.5  fim. 

Many  grain  boundanes  were  located  near  the  transition 
region  of  the  zone-recrystallized  film.  A  typica’  EBIC  image 
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FIG.  1 .  EBIC  image  in  the  vicinity  of  a  lateral  pn  junction  which  is  intersect¬ 
ed  by  grain  boundaries.  The  large  dark  square  which  extends  from  the  diode 
n"  region  is  an  anisotropically  etched  pit  used  to  determine  grain  orienta¬ 
tions  in  the  rectystallized  silicon  film.  A  second  dark  square  at  the  right 
indicates  that  a  grain  boundary  is  located  between  the  two  squares,  and  the 
misorientation  of  the  squares  indicates  that  the  angle  of  the  grain  boundary 
is  approximately  20*.  Note  the  large  protrusions  where  the  grain  boundaries 
intersect  the  junction  edge. 

from  this  region  is  shown  in  Fig.  1 .  The  arsenic  protrusions 
extend  approximately  3-5 /rm  from  the  edge  of  the  lateral  pn 
junction.  A  protrusion  length  is  assumed  to  be  roughly  equal 
to  a  diffusion  length  (Dt )  ‘  '’*,  where  D  is  the  coefficient  of  line 
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FIG  2.  lal  Secondary  electron  image  in  the  viciniiy  of  a  lateral  pn  junction 
which  IS  intersected  by  subboundanes  The  distinct  contrast  is  due  to  prefer¬ 
ential  electron  channeling  ibi  Corresponding  EBIC  image.  .Note  the  small 
protrusions  along  the  subboundaries. 
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defect  diffusion  and  t  is  the  diffusion  time.  From  this  rela¬ 
tion,  the  coefficient  for  arsenic  diffusion  along  a  grain 
boundary  at  1 100  *C  is  about  1 X 10“ ' '  cm^/s.  The  value 
which  was  estimated  in  Ref.  12  for  the  case  of  arsenic  diffu¬ 
sion  at  1000  *C  was  also  about  1 X  10“  "  cmVs.  If  the  en¬ 
hanced  diffusion  which  brings  about  the  transistor  failure 
mode  of  Ref.  1 1  is  assumed  to  correspond  to  a  diffusion 
length  of  1.5  fim  for  an  arsenic  diffusion  time  of  90  min  at 
900  *C,  the  estimated  coefficient  of  grain  boundary  diffusion 
for  this  case  is  approximately  4x  10“'^  cmVs.  Since  these 
particular  estimates  vary  as  the  square  of  a  diffusion  length 
and  yet  are  comparable  to  within  a  factor  of  2,  the  degree  of 
enhanced  arsenic  diffusion  along  grain  boundaries  appears 
to  be  weakly  dependent  upon  temperature  over  the  range  of 
900-1 100  *C.  This  conclusion  is  not  consistent  with  that  of 
earlier  work'^  in  which  the  enhanced  diffusion  of  arsenic 
along  grain  boundaries  in  laser-recrystallized  silicon-on-in- 
sulator  films  suggested  a  thermal  activation  energy  of  about 
2.3  eV. 

The  EBIC  analysis  in  the  vicinity  of  subboundaries  sug¬ 
gests  a  lesser  degree  of  enhanced  arsenic  diffusion.  Figure 
2(a)  shows  two  subboundaries  separating  regions  of  distinct¬ 
ly  different  contrast.  The  contrast  results  from  preferential 
electron  channeling  along  certain  crystalline  orientations.^ 
The  corresponding  EBIC  image  of  Fig.  2(b)  shows  small  pro¬ 
trusions  at  the  subboundary  locations  which  extend  approxi¬ 
mately  1  fim  beyond  the  lateral pn  junction  edge.  The  protru¬ 
sion  lengths  were  sometimes  greater  along  other 
subboundaries  within  the  sample,  but  in  no  case  did  a  protru¬ 
sion  extend  more  than  about  2/xm.  For  a  nominal  protrusion 
length  of  l.S  pm,  the  estimated  coefficient  of  arsenic  diffu¬ 
sion  along  subboundaries  has  the  approximate  value  of 
1 X 10“  cmVs,  roughly  one  order  of  magnitude  less  than 
for  the  grain  boundaries.  Under  more  ordinary  processing 
conditions,  the  subboundary  diffusion  length  scales  down¬ 
ward  by  a  factor  of  2  as  the  diffusion  time  is  reduced  to  one 


hour,  and  some  degree  of  scaling  can  be  expected  as  the  diffu¬ 
sion  temperature  is  reduced.  These  results  suggest  that  field- 
effect  transistors  can  be  randomly  positioned  with  respect  to 
the  subboundaries  provided  that  channel  lengths  are  greater 
than  about  l.S  pm. 
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Rapid  thermal  annealing  of  Be,  Si,  and  Zn  implanted  GaAs  using  an  ultrahigh 
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The  use  of  a  100-kW  water-walled  dc  argon  lamp  to  anneal  ion-implanted  GaAs  is  reported. 

Annealing  cycles  of  3  and  10  s  and  peak  temperatures  from  950  to  1200  *C  have  been  used  to 
anneal  Be,  Si,  and  Zn  implanted  following  representative  implant  schedules  of  technological 
importance.  It  is  demonstrated  that  this  technique  is  superior  to  conventional  furnace  anneal 
techniques  in  terms  of  the  doping  profiles,  peak  carrier  concentrations,  activation  efficiencies 
(particularly  at  high  doses),  and  mobilities  achieved.  The  annealing  technique  should  be 
applicable  to  large  volume  GaAs  integrated  circuit  production  and  100-mm-diam  wafers  can  be 
annealed  in  a  single  exposure  with  better  than  2%  temperature  uniformity  (Si  data). 

PACS  numbers:  81.40.Ef,  61.70.Tm,  72.80.Ey,  81.40.Rs 


The  use  of  rapid  thermal  annealing  (RTA)  techniques  to 
anneal  ion  implanted  GaAs,  and  related  III-V  compound 
semiconductors,  promises  to  have  a  significant  impact  on  the 
device  technology  of  these  materials  because  RTA  should 
permit  higher  anneal  temperatures  with  improved  dopant 
activation  and  carrier  mobilities,  yield  sharper  junctions, 
and  result  in  higher  doping  levels  than  can  be  achieved  by 
furnace  anneal  techniques  while  simultaneously  reducing 
capping  requirements,  and  thereby  significantly  simplifying 
wafer  processing.  In  this  letter,  we  demonstrate  that  these 
objectives  can  be  achieved  when  a  high  power,  water-walled 
dc  arc  lamp,  capable  of  processing  12S-mm-diam  wafers,  is 
used  to  rapidly  anneal  GaAs. 

Several  RTA  techniques  have  been  applied  to  GaAs 
with  good  initial  success,  but  each  also  has  important  limita¬ 
tions.  Arai  et  al.'  first  reported  using  halogen  lamps  to  an¬ 
neal  GaAs.  The  GaAs  was  placed  face  down  on  a  Si  wafer 
and  annealed  5  s  with  a  peak  temperature  of  950  ’C.  Nearly 
100%  activation  of  a  3  X 10'*  cm”*,  70-keV  Si  implant  was 
achieved.  More  recently,  Kuzuhara  et  al.^  used  the  same 
technique  to  anneal  a  higher  dose  Si  implant  (5  X  10'*  cm”*, 
100  keV)  and  obtained  75%  activation  and  a  sheet  mobility 
of  3700-cm*/V  after  a  2  s,  950  *C  peak  temperature  anneal. 
They  observed  arsenic  loss  and  the  formation  of  gallium  pits 
on  the  wafer  surface  if  they  used  a  5-s  anneal,  however.  The 
maximum  rate  of  temperature  increase  in  these  experiments 
was  200  ’C/min. 

Davies  et  al.^  have  used  filament  lamps  focused  by  ellip¬ 
tical  mirrors  to  anneal  GaAs  implanted  with  zinc  and  silicon 
to  1000  *C.  They  achieved  essentially  100%  activation  of  a 
2  X 10'*  cm”*,  200-keV  Zn  implant  and  50%  activation  of  a 
4x  10'*  cm”*,  200-keV  Si  implant.  The  system  they  used 
could  heat  the  sample  to  1000  *C  in  one  second,  but  had  a 
long  fall  time  because  of  the  nearly  lossless  optical  cavity 
used. 

Chapman  et  al.*  have  used  a  graphite  strip  heater  to 
anneal  GaAs  to  1 140  *C  for  10  s.  They  obtained  18%  activa¬ 
tion  of  a  1 X  10'’  cm”*,  400-keV  Se  implant  done  into  a  hot 


(300  *C)  substrate.  They  observed  significant  diffusion  of  Si 
into  the  GaAs  from  the  Si3N4  cap  used.  With  the  graphite 
heater  strip,  the  temperature  rise  was  relatively  slow. 

As  presently  configured,  the  sources  that  have  been 
used  for  the  RTA  of  GaAs  in  the  above  studies,  have  impor¬ 
tant  limitations.  The  problems  of  the  rise  and  fall  times  of  the 
heating  cycles  have  already  been  mentioned.  The  latter  two 
systems  also  were  line  sources  and,  thus,  are  unsuitable  for 
annealing  large  area  samples  because  the  large  thermal  gra¬ 
dients  created  in  samples  as  the  line  is  scanned  across  them 
can  cause  the  formation  of  slip  planes.  The  graphite  strip 
heater  further  has  the  problem  that  it  must  be  operated  in  a 
vacuum  and  the  possibility  of  contamination  from  the  po¬ 
rous  graphite  also  exists. 

The  system  used  in  the  present  work  overcomes  these 
difficulties,  and  as  well,  achieves  superior  results  to  pre¬ 
viously  reported  furnace  and  RTA  anneal  procedures. 

The  arc  lamp  annealing  system  used  has  been  described 
previously  by  Gelpey  and  Stump.’  Briefly,  the  light  source  is 
a  lOO-kW  water-walled  dc  argon  arc  lamp  which  has  a  time 
constant  of  a  few  milliseconds  and  a  color  temperature  of 
5500  K,  and  produces  a  uniform  power  flux  density  of  80  W/ 
cm*  at  full  intensity  (400-A  lamp  current)  over  the  sample 
surface.  Temperature  uniformities  of  2%  have  been  mea¬ 
sured  over  100-mm-diam  Si  wafers. 

The  GaAs  used  in  this  study  was  semi-insulating  un¬ 
doped  material.  The  ion  implantation  sequences  (species,  en¬ 
ergy,  and  dose)  are  listed  in  columns  1-3  of  Table  I.  All 
implants  were  performed  into  bare  surfaces  with  the  samples 
nominally  at  room  temperature.  After  implantation  and  pri¬ 
or  to  annealing  a  l20-nm-thick  layer  of  SiO-  was  sputter 
deposited  on  both  surfaces  of  the  wafer  to  serve  as  the  encap¬ 
sulation.  During  the  anneal,  the  samples  were  held  in  small 
slotted  quartz  pins  so  that  they  are  thermally  iconductively) 
isolated  form  their  surroundings.  The  sample  temperature 
was  monitored  using  an  optical  pyrometer  viewing  the  back 
of  the  GaAs  wafer.  The  temperature  measurement  in  the 
present  work  is  thought  to  be  accurate  to  ±  20  'C. 
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TABLE  I.  Implant  schedules  and  annealing  cycles  used  in  this  study,  and  the  results  of  Hall  measurements  of  the  surface  layers  including  the  sheet  carrier 
concentration  n,.  sheet  Hall  mobility /i,.  and  sheet  resistance All  samples  were  undoped  semi-insulating  GaAs  wafen. 


Species 

Energy 

(keV) 

Dose 

(cm'^) 

Anneal  cycle 

Thickness  Time  Pk.  temp,  n, 

l/«m|  |s|  CCI  |cm"') 

Icm*/Vs| 

K. 

(32/0) 

Zn 

200 

1.4x10" 

415 

10 

950 

7.2  XlO" 

62 

140 

Zn 

200 

1.4x10" 

415 

10 

1020 

6.8  XlO" 

70 

130 

Zn 

200 

1.4X10" 

415 

3 

1160 

1.3  XlO'* 

62 

80 

Si 

200 

4  XIO" 

415 

10 

950 

no  activation 

Si 

200 

4  xlO" 

415 

10 

1020 

1.8  XlO'* 

1850 

90 

Si 

200 

4  XlO" 

415 

10 

1100 

1.15x10" 

1600 

33 

Si 

200 

4  XlO'” 

415 

3 

1160 

1.25x10" 

1500 

35 

Be 

so. 

4.4x10''. 

415 

10 

950 

5.4  XlO" 

92 

125 

ISO 

5.  IX 10-' 

Be 

SO.  ISO 

4.4X10".  5.1X10'* 

415 

10 

1020 

5.8  XlO" 

67 

160 

Be 

SO.  150 

4.4X10'*.  5.1X10" 

415 

10 

1100 

6.5  XlO" 

33 

'300 

Be 

SO.  150 

4.4X10'*.  5.1X10" 

415 

3 

1160 

6.2  XlO" 

65 

160 

None 

None 

None 

365 

None 

None 

2.1  XlO* 

2300 

1.3X10* 

None 

None 

None 

365 

10 

lOSO 

4.9  XlO* 

612 

2.1x10* 

None 

None 

None 

365 

10 

1100 

2.3  XlO* 

75 

3.8X10* 

None 

None 

None 

365 

3 

1200 

2  XlO* 

36 

9.1X10* 

None 

None 

None 

525 

None 

None 

3.15x10* 

S500 

3.6x10* 

None 

None 

None 

525 

10 

1050 

3.4  XIO’ 

202 

9.1X10* 

None 

None 

None 

525 

10 

1090 

2  XlO* 

10 

3  XlO* 

None 

None 

None 

525 

3 

1100 

8.9  XlO’ 

135 

5.2X10* 

None 

None 

None 

Furnace — see  text 

1  XlO* 

150 

4  XlO* 

The  time-temperature  cycles  used  were  similar  to  those 
recently  used  to  anneal  InP.*'  No  change  could  be  observed  in 
the  surface  morphology  of  the  GaAs  after  any  of  the  anneal 
cycles  used  in  the  results  reported  in  this  study. 

After  the  anneal,  the  SiO;  was  removed  from  the  sur¬ 
face  of  the  samples,  they  were  patterned  with  a  standard  van 
der  Pauw  pattern,  and  their  sheet  resistivity,  carrier  concen¬ 
tration,  and  mobility  were  measured.  The  anneal  cycle  used 
and  the  results  of  this  initial  characterization  are  presented 
in  columns  4-7  of  Table  I. 

Several  of  the  samples  were  further  profiled  using  dif¬ 
ferential  Hall  measurements.^  These  profiles  are  presented 
in  Figs.  1-3. 

Zn  implant.  A  peak  hole  concentration  =9x10'* 
cm"’  (see  Fig.  I|,  sheet  mobility  of  62  cm*/Vs,  sheet  resis¬ 
tance  of  80  n  /□,  and  93%  electrical  activation  are  obtained 
after  a  3-s  anneal  to  1 160  *C.  To  the  best  of  our  knowledge, 
this  represents  the  highest  doping  level  and  electrical  activa¬ 
tion  that  has  been  reported  for  zinc  implantation  into  GaAs 
(liquid-phase-cpitaxial-regrowth  annealing  techniques  are 
excluded).  The  mobility  (Fig.  1)  compares  favorably  with 
mobility  data  for  zinc  diffusion  in  GaAs  at  the  same  doping 
level.*’’  The  zinc  concentration  profile  is  consistent  with  the 
generally  accepted  interstitial-substitutional  zinc  diffusion 
theory  in  GaAs  (Ref.  18),  although  the  extent  of  diffusion  is 
more  than  expected  for  such  a  short  annealing  time.  In  a 
related  experiment,  no  enhanced  lateral  diffusion  of  zinc, 
implanted  through  photoresist  into  10-/im-wide  stripes, 
could  be  detected  after  a  similar  arc  lamp  anneal  in  spite  of 
the  presence  of  the  SiO-  capping  layer. 

Be  implant.  All  cycles  yielded  similar  results  but  the 
best  results  are  obtained  for  the  lower  temperature  anneal 
cycle  950 *C,  A  maximum  hole  concentration  =2x10” 
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cm"’,  and  an  electrical  activation  of  72%  is  obtained  (see 
Fig.  2).  In  a  related  experiment,  the  junction  depth  in  vapor 
phase  epitaxially  (VPE)  grown  GaAs  {Nj,  —  =  8x  lO” 

cm"’),  which  had  been  identically  Be  implanted  and  an¬ 
nealed,  was  found  to  be  1-1. 1  /zm.  ITiese  results  are  superior 
to  the  best  of  published  furnace  anneal  data”'"  in  several 
aspects.  In  furnace  annealing,  which  is  usually  done  at  800- 
900  *C  for  10-1 S  min,  significant  damage  enhanced  diffusion 
of  Be  occurs.  This  causes  junction  misplacement  and  limits 
the  maximum  Be  concentration  to  the  mid  10'*  cm"’  range 
with  =40%  electrical  activation  at  a  10”  cm"’  dose  level. 
No  Be  diffusion  takes  place  upon  rapid  thermal  annealing 


FIG.  I.  Carrier  concentration  and  mobility  profiles  mea.sured  by  the  dilfer- 
enlial  Hall  technique  on  arc  lampannealed  zinc  implanted  1 1 .4  x  10"  cm  ~ 
200  keVi  GaAs  samples  annealed  for  3  s  to  1 160  'C. 
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FIG.  2.  Carrier  concentration  and  mobility  profiles  measured  on  two  silicon 
implanted  I4x  10'^  cm~^,  200  keV)  GaAs  samples:  one  annealed  3  s  to 
1 160  *C,  the  other  annealed  10  s  to  1 100  *C. 


0  0.2  0.4  0£  0.8  LO  1.2 

Depth  (;im) 


FIG.  3.  Carrier  concentration  and  mobility  profiles  measured  on  berylium 
implanted  (4.4X  10'^  cm“^  50  keV  and  5.1  x  10“  cm"*.  150  keV)  GaAs 
samples  annealed  for  10  s  to  950  *C. 


and  a  high  peak  concentration  is  maintained.  It  is  also  found 
that  mesa  diodes  fabricated  in  this  VPE  material  are  superior 
to  the  best  published  Be  implanted  furnace  annealed  diodes. 
These  electrical  results  will  be  reported  in  a  succeeding  pa¬ 
per. 

Si  implant.  Both  10-s  and  3-s  anneals  at  maximum  tem¬ 
peratures  of  1 100  and  1 160  *C  respectively,  give  similar  re¬ 
sults.  A  maximum  carrier  concentration  of  7.5  X 10'*  cm“^ 
sheet  resistance  of  35 12  /Q,  and  33%  elearical  activation  are 
obtained  (see  Fig.  3).  At  the  same  implant  energy  and  dose, 
Davies  et  al.^  have  reported  50%  electrical  activation  and  a 
sheet  resistance  of  23.9  /2  A3  after  a  2.5-s  anneal  with  a  maxi¬ 
mum  temperature  of  1000  *C.  However,  the  temperature 
measurement  in  Ref.  3  may  not  be  accurate  because  of  the 
slow  response  time  of  the  thermocouple,  and  we  believe  that 
the  50%  electrical  activation  reported  was  obtained  at  tem¬ 
peratures  much  higher  than  1000  *C. 

Semi-insulating  material-.  As  is  shown  in  Table  I,  the 
sheet  resistance  of  unimplanted  semi-insulating  undoped 
GaAs  after  arc  lamp  annealing  is  either  comparable  to  or 
higher  than  that  obtained  after  1 5  min  of  face  to  face  proxim¬ 
ity  cap  annealing  (typical  results  of  many  runs)  under  argon 
atmosphere  at  850  *C.'^  Results  for  furnace  annealing  under 
arsenic  over  pressure  are  similar  to  the  proximity  cap  anneal¬ 
ing.  To  our  knowledge,  this  is  the  first  quantiutive  report  on 
the  effect  of  rapid  thermal  annealing  on  the  sheet  resistance 
of  semi-insulating  undoped  GaAs.  If  we  use  the  sheet  mobil¬ 
ity  as  an  indication  of  surface  quality,  the  1050  *C  anneals 
are  superior  to  furance  annealing  while  the  1 100  *C  anneal  is 
comparable.  No  surface  conversion  to  p-type  was  observed 
even  in  anneals  as  high  as  1200  *C. 

In  conclusion,  it  has  been  demonstrated  that  an  ultra- 
high  intensity  arc  lamp  can  be  used  to  rapidly  thermal  anneal 
ion-implanted  GaAs  with  superior  results  in  terms  of  activa¬ 
tion  efficiency,  peak  doping  level,  and  sharpness  of  doping 


profile  both  laterally  and  vertically.  The  applicability  of  this 
system  to  large  area  wafers,  the  rapid  response  time  of  the 
lamp  system,  and  the  ease  with  which  the  sample  surface  can 
be  encapsulated  during  processing  are  all  attractive  features 
and  important  advantages  of  this  technique. 
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ABSTRACT 

This  paper  considers  the  problem  of  maximizing  the  energy  or 
average  power  transfer  from  a  nonlinear  dynamic  n-port  source. 
The  main  theorem  includes  as  special  cases  the  standard 
linear  result  Yioad  *  Y*source  *nd  a  recent  finding  for  non¬ 
linear  resistive  networks.  An  operator  equation  for  the  optimal 
output  voltage  v(0  is  derived,  and  a  numerical  method  for 
solving  it  is  given. 
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1.  Introduction 

This  paper  addresses  the  problem  of  extracting  the  maximum  energy  or 
average  power  from  a  source  with  the  topology  shown  in  Fig.  1.  As  in^ 

[1],  the  problem  is  formulated  as  finding  the  optimal  output  voltage 
v(»)  for  each  current  source  waveform  igC*)  rather  than  finding  a  load 
that  maximizes  the  power. 

A 

The  central  result  is  the  operator  equation  (6)  for  v(*).  Theorem  1 
gives  conditions  that  guarantee  uniqueness  and  global  optimality  of  the 
solution:  the  standard  result  for  linear  systems  [1]  and  recent  work 
on  resistive  nonlinear  systems  [2]  follow  as  special  cases.  Equation  (11) 
defines  a  practical  algorithm  for  solving  (6),  and  Theorem  2  gives  conditions 
that  guarantee  convergence. 

A 

The  solution  v(*)  can  be  of  engineering  value  in  two  ways.  First,  the 
average  power  F(v)  tells  us  the  optimal  performance  that  is  possible  in 

A 

principle.  Second.  v(«)  itself  is  a  concrete  design  goal.  If  the  source 
admittance  operator  F  is  continuous,  a  load  for  which  the  output 
approximates  v(*)  (in  the  Hilbert  space  norm  used  in  this  work)  will  absorb 
an  average  power  that  approximates  F(v). 


1.  Reference, [1]  actually  deals  with  the  dual  network,  where  the  source 
appears  in  Thevenin  form. 
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II.  Results 


2.1)  Notation  and  Definitions 

Let  L  be  any  real  inner  product  space  and  L  any  linear  subspace  of  L. 

An  operator  F:  L  -►  L  is  said  to  be 

a)  strictly  increasing  if 

<F(y)  -  F(x),  y-x^>  0,  Vx  y  e  L,  (1) 

b)  uniformly  increasing  if  for  some  5  >  0, 

^F(y)  -  F(x),  y-x>  >  6||y-x||^,  Vx.y  e  L,  (2) 

c)  Lipschitz  continuous  if  for  some  '<  ^  0, 

||F(y)  -  F(x)||  <  Klly-xll  .  Vx.y  e  L.  (3) 

Let  L,  L'  be  any  real  inner  product  spaces  and  L(L,L')  denote  ti.e  space 
of  continuous  linear  maps  from  L  to  L',  with  the  operator  norm  [3,  p.53]. 
For  AeL(L,L').  let  A®^  denote  the  adjoint  of  A. 

Given  an  operator  F:  L  -*■  L'  and  x,heL,  suppose  there  exists  an  element 
denoted  6F(x,h)  of  L'  such  that 


lim 

t^+ 


F(x+th)  -  F(x) 
t 


6F(x,h) 


L' 


*  0. 


Then  6F(x,h)  is  called  the  Gateaux  variation  of  F  at  x  for  the  increment  h 
[4,p.251].  If  (SF(x,h)  exists  for  all  x,heUand  if  for  each  xsLthe  map 
h  -►  5F(x,h)  is  an  element  of  L(L,L'),  then  F  is  said  to  be  Gateaux 
differentiable  on  L.  In  this  case  the  map  x  6F(x,« )  is  called  the  Gateaux 
derivative  of  F  and  denoted  OF:  L  -►  L(L,L')  [4, pp. 255-256].  Similarly 
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6F(x,*)  Is  denoted  DF(x)eL(L,L' ) ,  and  6F(x,h)  is  denoted  (DF(x))heL'.  The 

value  of  using  the  Gateaux  derivative  rather  than  the  more  restrictive 

Frechet  derivative  [4, Chap. 3]  will  become  apparent  in  section  3.1. 

The  Hilbert  space  L^.  is  the  set  of  all  measurable  functions  x:  R-*-  JR” 

such  that  the  integral  of  x^(*)  over  R  is  finite,  j*l,...,n,  equipped 

with  the  usual  inner  product  <*,*)  and  norm,  ||x|l  ~(x,x)^^^. 

2 

For  each  T  >  0,  ^  is  the  set  of  all  periodic  measurable  functions 

x:  IR  -»■  IR  with  period  T  such  that  the  integral  over  one  period  of  xH*)  is 

**  J 

finite,  j»l,...,n.  It  is  a  Hilbert  space  with  the  "average  power"  inner 
product 


^  T 


x{t)*y(t)  dt. 


(4) 


n  2 

where  x*y  is  the  Euclidean  inner  product  on  F  .  The  norm  on  y  is  denoted 
II  xiit  ^(x,xy(^. 


2.2)  Main  Theorem 

Theorem  1  (Maximum  Average  Power  in  the  Periodic  Steady  State) 


Fix  T  >0  and  let  the  n-port  Wj  in  Fig.  1  be  characterized  by  an 

2  2 

admittance  operator  F:  Lj  Lj,  where  Lj  is  any  linear  subspace  of  j. 

A 

Suppose  F  is  Gateaux  differentiable  on  Ly  and  that  the  associated  operator 
H:  Ly  -*•  Ly,  given  by 


H:  V  H.  F{v)  +  {DF(y) 


(5) 


2.  Thus,  if  y(*)  has  period  T  and  lies  in  the  response  i(*)  of  W, 

cannot  have  subharmonics.  ' 
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is  strictly  increasing. 

Then  for  each  H(L-j.)  there  is  a  unique  solution  v(ig)eLj  to 

ij  =  H(v) ,  (6) 

and  the  average  power  absorbed  by  the  load* 

^■(v)  ’  ^!s  ‘ 

has  a  unique  global  maximum  over  Lj,  which  is  attained  at  v  *  v(ig). 

Corollary  (Maximum  Total  Energy  for  Transients) 

2 

Let  L  be  a  linear  subspace  of  and  substitute  L  for  Lj  in  the  assumptions 

A  ^ 

of  Theorem  1.  Then  the  same  conclusions  hold,  but  with  v(i^)eL  maximizing 

the  total  energy  E(v)  -  F(v),^  over  L. 

Note  that  in  general  F  can  be  nonlinear  and  time-varying. 

In  applications  one  might  wish  to  restrict  attention  to  currents  and 
2 

voltages  in  j  with  additional  properties  such  as  continuity  or  boundedness, 
n  f  I 

2 

This  is  the  reason  for  introducing  IjC  j  in  the  formulation  of  Theorem  1. 

The  essential  idea  behind  the  theorem  is  that  a  solution  v(*)  of  (6) 
is  a  stationary  point  of  F:  Lj  -►  IR  ,  and  the  monotonicity  assumption  on  H 
guarantees  that  F  is  strictly  concave.  Details  follow. 


3.  A  more  explicit,  but  cumbersome,  notation  would  be  Fly.ij).  Using  it, 
Theorem  1  states  that  Vv,i5eLj,  P(y,i3)  <  P(y(is),is)  if  y  y(is)- 

4.  For  the  Corollary,  the  adjoint  is  of  course  taken  with  respect  to  the 
inner  product  on  rather  than  ^ 
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Proof  of  Theorem  1 

Uniqueness  of  the  solution  to  (6)  follows  from  the  fact  that  H  is 
strictly  increasing.  By  the  chain  rule  for  the  composition  of  Frechet- 

A  A 

and  Gateaux-differentiable  functions  C4,p.253]  P  is  Gateaux  differentiable 
and  for  all  x.he 

(DP(x))h  *<^13  -  F(x).h).r  ■  “ 

{!s  “  ^!!)t  ' 

-  H(x).h).p.  (8) 

Thus  if  igcH(Ly), 

a)  DP(v(i5))  »0et(Lj,LT.), 

b)  given  any  x,y  eLj.,the  map  \  -►Kx+x(y-x)]  is  differentiable  at 
each  XeIR ,  and 

c)  ^^x+x(y-x)]  =  ^ig  -  H[x+x{y-x)],  (y-x))j. 

To  show  that  v(i^)  globally  optimizes  F,  fix  igeH{Lj),  let  v  *  v(ig), 
and  choose  anyveLj,  v  v.  Then 

P{v)  -  P{v)  = 


PCv+x(v-v)]  -  PCv+a(v-v)3 

-  '  x=l  -  '  '  x*0 


w  Flv+x(v-v)]}  dx. 


Using  c") ,  the  integrand  above  is 


96 


-6- 


(U  ■  OCv+x(v-v)],  v-v)^  = 

(since  =  H(v)) 

•Y^HCv+x(v-v)]  -  H(v),  Cv+x(v-v)]  -  Cv]^j,  Vx>0, 

and  the  integrand  vanishes  at  x*0.  The  inner  product  above  is 
positive  for  Xj'O  since  H  is  strictly  increasing  by  assumption, 
integrand  in  (9)  is  negative  for  x>0  and  zero  for  x«0,  so  P“(v) 
claimed. 

The  proof  of  the  Corollary  is  essentially  identical  and  will  be  omitted. 


strictly 
Thus  the 
<  P(v)  as 


2.3)  Relation  to  “Impedance  Matching"  Ideas 


The  emphasis  in  this  paper  is  on  finding  the  optimal  output  voltage 

A 

v(«)»  not  the  optimal  load.  But  the  relation  to  impedance  matching  ideas 
deserves  comment. 

If  the  load  in  Fig,  1  is  taken  to  be  the  (generally  noncausal)  admittance 

%t'  4 "  ‘-T’ 


adj 

5opt’  'i  **  V  , 


(10) 


then  the  network  is  uniquely  solvable  given  any  i^  e  H(Lj),  and  the  output  voltage 
v(*).  which  necessarily  equals  v(i^),  globally  optimizes  F.  This  generally 
noncausal  load  is  "matched"  to  the  source  for  all  inputs  igcH^Lj),  and  this 
result  holds  generally  for  a  nonlinear,  time-varying,  even  noncausal  source 
admittance  F.  The  reader  can  easily  verify  that  in  the  LTI  case  (10)  reduces 
to  the  standard  linear  theorem  Y-|Q3(j(jt«))  *  YjQypj.g( ju) .  More  detail  for 
the  linear  1-port  case  is  given  in  Section  3.1. 
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Of  course  in  practice  one  has  a  causal  load,  usually  predetermined,  and 
wishes  to  couple  it  to  the  source  through  a  lossless  matching  network  designed 
to  maximize  the  absorbed  power  over  a  range  of  inputs.  In  the  linear  case 
this  important  problem  is  called  "broadband  matching"  [5-8].  We  note  that 
in  both  the  linear  and  nonlinear  cases  the  problem  can  be  viewed  as 
compensating  or  coupling  to  a  predetermined  load  using  lossless  elements  in 
such  a  way  that  the  response  approximates  that  of  the  noncausal  exact  match 
®opt  input  range  of  interest. 

For  a  particular  drive  i^,  the  situation  is  somewhat  different.  The 
optimal  voltage  v(*)  is  unique,  but  the  optimal  load  is  not:  the  only 

A  A 

requirement  on  G  is  that  6(v)  »  linear  case  where  F  and  6 

are  respectively  represented  by  admittance  matrices  ^  and  Y|^(juj),  there 
are  in  general  infinitely  many  optimal,  positive  semidefinite  choices  of 
at  a  given  u  for  which  the  network  is  uniquely  solvable  [9].  The  problem  of 
finding  solutions  in  particular  classes,  such  as  the  class  of  resistive 
loads,  is  studied  in  [10]. 

2.4)  Numerical  Algorithm 

Equation  (8)  shows  that  i^  -  H(v)  is  the  gradient  [3,p.72],  [4,p.l96] 
of  ^  at  V,  V  ij,v  e  Lj.  This  suggests  that  we  attempt  to  maximize  P  by  a 
simple  "hill-climbing"  algorithm  of  the  form 

?j+l  *  ^^^s  ’ 

for  some  x>0.  Note  that  under  the  assumptions  of  Theorem  l,  i^  x  .  -*•  x  e  Ly 
and  H  is  continuous,  then  i^  »  H(x)  and  x  globally  maximizes  F.  By  tightening 
the  assumptions  a  little  further,  we  can  guarantee  convergence  for  all 
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sufficiently  small  positive  x. 

Theorem  2 

Strengthen  the  assumptions  of  Theorem  1  by  supposing  further  that  Lj 
is  closed  and  H  is  uniformly  increasing  and  Lipschitz  continuous  on  L-p. 
(See  (2),  (3).)  Then  for  any  ijeLj,  any  initial  guess  x^eLj,  and  any 
Xe{0,  23/K  ),  the  sequence  generated  by  (11)  converges  to  v(ig). 

Remark 

Note  that  Theorem  2  also  guarantees  existence  of  a  solution  to  (6)  for 
all  ig  e  4,i,e.,  H(L^)  «  L^. 


Proof 

2  2 

Since  Lj^  j  is  closed  and  ^  is  complete,  Ly  is  complete  [11]. 

It  remains  to  show  that  M  is  contractive,  i.e.,  that  for  some  C  <  1, 

II  M(y)  -  M(x)  11  y  £  c  II  y-x||  y,  Vx,y  e  Ly,  (12) 

to  guarantee  ||  x^  -  v(i^)||  y  0  by  the  contraction  mapping  theorem  [3,p.l02], 
[12,p.28].  But 

l|H(y)  -  M(x)||^  • 

'  -  y 

('y  -  x  -  X(H(y)  -  H(x)),  y  -  x  -  A(H(y)  -  H(x)))y  » 
l|y  -  xll^  -  2xfH(y)  -  H(x),  y  -  x)y  n2|lH(y)  -  H{x)lly  l 


-9- 


and 


(1  -  2X6  +  xV)  11  y-x|l  2  A.c2(x)  lly-xll^  , 

-  .  T  '  '  T 

C2(X)  <  1.  VXe(0.25/K^). 


III.  Examples 


3.1)  Linear  Operators  and  Memoryless  Operators 


2 

Consider  the  time-invariant  scalar  case  for  simplicity,  and  let  stand 
2  ‘ 
for  U, 

If  F  Is  the  convolution  operator:  v  a*b  where  a:  R  Is  absolutely 

^  2  2 

Integrable,  then  for  each  T  >  0, Is  a  continuous  linear  operator;  Lj  -►  L^ 

A 

and  therefore  Gateaux  (In  fact,  Frechet)  differentiable.  Since  F  Is  linear 

Xf 

DFo(x)  s  F.,  and  the  reader  can  easily  verify  that  (DFJx))®*^'^  ■ 
aoj  ^  ^  ^ 

:  v(*)  H.  a(-*)*v(.),  I.e.,  the  adjoint  operation  turns  the  Impulse  response 

around  In  time.  Furthermore,  H.:  v(*)  h-  Ca(*)  +  a(-*3*v(.)  is  strictly 
2  ^  . 

Increasing  on  Lj  for  each  T  >  0  Iff  Re  {a(ja>)}  >  0  for  all  u,  where  a  Is  the 
Fourier  transfom  of  a.  This  follows  from  a  slight  modification  of  [12;pp.25, 
174,235].  Similar  results  hold  If  a(*)  contains  Impulse  functions  as  well 
[1 2 :pp. 246-247].  Thus  v(*)  h-  a(-^*v(*),  and  G^p^  Is  represented  In  the 

A^ 

frequency  domain  by  the  complex  admittance  a  (ju).  Therefore  Theorem  1  and 
equation  (10)  reduce  to  the  standard  result  Yigj^(ju))  “  ^source^'^“^ 

Is  linear  and  time-invariant. 

Suppose  Fjjj  Is  memoryless  but  possibly  nonlinear,  I.e.,  N-i  Is  a  resistor 
with  the  constitutive  relation  1»b(v),  Assume  that  b:  IR  Is  differentiable 

and  Its  derivative  b'(')  is  bounded.  Then  b  Is  Lipschitz  continuous  on  R 
and  hence  for  each  T  >  0  the  operator  F^j;  v(t)  h-  b(v(t))  maps  Lj  into  Lj. 

Using  Prop.  13  of  [13:p.85]  and  the  Lebesgue  Convergence  Theorem  [13:p.88], 
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«  2  2 
one  can  show  that  is  Gateaux  differentiable  on  Lj  and  that  for  all  x,y  e  Lj, 

(DF^(x))y  *  y(*)b' (x(*))eLj.  Furthermore  if  h:  v  h-  b{v)+vb'(v)  is  a 

strictly  increasing  function  on  IR  ,  then  v(t)  h-  b(v(t))  +  v(t)b'(v(t)) 

2  2 

is  a  strictly  increasing  operator:  Lj  -►  Lj.  Thus  Theorem  1  reduces  in  this 
case  to  the  result  in  [2]. 

2  2  2 

The  reader  can  easily  check  that  DF^j^:  L(L^,L^)  is  not  continuous 

unless  b'(*)  is  constant.  Thus  if  is  a  resistor  with  any^  nonlinearity 

(other  than  the  trivial  i=gv  +i),  F^j  is  not  Frechet  differentiable  [4, Chap. 3] 

2 

on  Lj.  This  is  the  reason  Theorem  1  was  formulated  in  terms  of  the  weaker 
Gateaux  derivative. 

3.2)  Positive  Linear  Combinations  of  Operators 

The  (noncausal)  matched  load  (10)  for  the  source  admittance  F  is  related 

adj 

to  F  by  a  mapping  i,  i(F)  *  Sopt‘  (DF(v))  v  .  Note  that  i  is  linear; 

i.e.  £(aF^  +  bFg)  =  a  z(F^)  +  bz(F2).  Given  F.|  and  Fg:  Lj  -*■  Ly,  consider 
F  aF-j  +  bF2.  The  reader  can  easily  verify  that  if  F^ ,  F2  satisfy  the 
conditions  of  Theorem  1  (resp.  Theorem  2),  then  F  also  satisfies  Theorem  1 
(resp.  Theorem  2),  provided  a  ^  0,  b  ^  0,  a+  b  >  0. 

For  example,  consider  the  source  shown  in  Fig.  2,  where  consists  of  the 
parallel  connection  of  an  LTI  1-port  and  a  nonlinear  resistor.  If  Y  and  g 
satisfy  the  conditions  in  section  3.1),  then  the  (noncausal)  matched  load 
has  the  form  shown  in  Fig.  2. 

3.3)  Circuit  Example 

Suppose  the  source  takes  the  specific  form  in  Fig.  3,  with  the  resistor 
curves  shown  in  Fig.  4.  The  convolution  kernel  a(t)  =  e"^,  t  >  0,  for  the 
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series  connection  of  inductor  and  resistor  satisfies  the  assumptions  of 
section  3.1.  The  resistor  curves  are  differentiable  everywhere  and 

h,,(v)  ^g,^(v)  +  vg^'(v)  =  (k+l)v|vl''’\  k  -  1,2,3,  (13) 


with  h^(0)  »  0.  All  the  assumptions  of  section  3.1  are  satisfied  except 
that  the  derivatives  g2'(*)  and  g3'{*)  are  unbounded.  (Since  they  are  bounded 
on  every  bounded  subset  of  IR  ,  a  more  detailed  argument,  omitted  here, 
shows  that  the  solutions  obtained  below  maximize  P  over  f)  L^,  which  is 
certainly  sufficient  in  practice.) 

A 

To  find  the  optimal  output  v  in  the  three  cases,  we  carried  out  the 
iterative  procedure  (11),  which  becomes  in  this  instance 


+  X.(t),'  k  »  1,2,3. 

(14) 

Miss  Pearl  Yew  of  MIT  has  written  a  program  in  PASCAL  to  do  the  numerical 
solution.  It  was  run  on  the  DEC 20  in  MIT's  Research  Laboratory  of  Electronics 
with  an  initial  guess  of  Xq(*)  =  0,  and  found  to  converge  fairly  rapidly 
for  small  positive  values  of  x.  The  results  are  shown  in  Fig.  5. 

Since  g-j  represents  a  linear  resistor,  it  follows  from  the  traditional 
linear  theorem  that  v(t)  *  2sin(t)  for  k»l ,  in  agreement  with  the  numerical 
solution.  Note  that  the  instantaneous  current  drained  by  the  nonlinear  source 
resistor  increases  in  magnitude  with  k  for  lv|  >  1  but  decreases  for 
■jvl  <  1.  Thus  it  is  intuitively  reasonable  that  the  optimal  output  spends  a 
progressively  greater  percentage  of  time  in  the  region  |v|  <  1  as  k  increases. 


6  sin(t)  -  (k+l)Xj(t) Ixj(t) 


,k-l 


e”l^“^lx  (T)dT 
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Fig.  1  The  solution  of  the  operator  eguation  (6),  given  a  particular  i^C*). 
is  the  optimal  output  voltage  v{*).  It  can  be  achieved 
with  a  variety  of  loads. 

Fig.  2  The  optimal  load  admittance  is  obtained  by  a  linear  operator/ 
on  the  source  admittance.  Thus  the  optimal  load  for  a  parallel 
connection  of  source  admittances  Is  the  parallel  connection  of  the 
optimal  loads  for  each  source  separately. 

Fig.  3  Theorems  1  and  2  let  us  numerically  determine  the  optimal  output 

voltage  v(*)  for  this  circuit  when  the  resistor  curves  are  as  shown 
In  Fig.  4. 

Fig.  4  The  three  resistor  curves  for  the  circuit  in  Fig.  3  are  gt^(v)  A.  vjvj 
k*l ,2.3,  with  g^(0)  i  0. 

Fig.  5  One  period  of  the  optimal  output  voltages  for  the  circuit  in  Fig.  3. 
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Electrical  characteristics  of  Be-impianted  GaAs  diodes  annealed 
with  an  ultrahigh  power  argon  arc  lamp 

K.  Tabatabaie-Alavi,  A.  N.  M.  Masum  Choudhury.  H.  Kanbe.*>  andC.  G.  Fonstad 

Deportment  of  Electrical  Engineering  and  Computer  Science  and  Center  for  Materials  Science  and 
Engineering.  Massachusetts  Institute  of  Technology.  Cambridge.  Massachusetts  02139 

J.  C.  Gelpey 

Eaton  Ion  Beam  Systems  Division.  Beverly,  Massachusetts  01915 
(Received  5  May  1983;  accepted  for  publication  8  July  1983) 

The  potential  of  arc  lamp  annealing  techniques  in  GaAs  device  processing  is  demonstrated  by  the 
fabrication  of  Be-implanted  mesapm  diodes.  Implants  were  done  at  50  and  120  keV  with  doses  of 
4.4 X  10'^  and  5. 1 X 10'^  cm"*,  respectively  (total  dose  =  9.5  X  lO'"*  cm"*)  into  a  14-/im-thick 
undoped  {No  —  N.^  sT.S  X 10''*  cm"*)  GaAs  epitaxial  layer  grown  by  vapor  phase  epitaxy.  Ten- 
second  annealing  cycles  with  peak  temperatures  of  950*  and  1050  ‘C  have  been  studied.  The 
electrical  characteristics  of  these  diodes  are  superior  to  published  furnace-annealed,  Be- 
implanted  GaAs  diodes. 

PACS  numbers:  73.40.Lq,  81.40.Ef,  85.30.De,  61.70.Tm 


Rapid  thermal  anneaiing(RTA)  is  emerging  as  a  power¬ 
ful  technique  in  both  Si  and  GaAs  processing.  Very  short 
annealing  periods,  resulting  in  minimum  dopant  redistribu¬ 
tion,  and  very  high  throughput  are  features  of  RTA  ideally 
suited  for  large  volume  fabrication  of  very  high  speed  inte¬ 
grated  circuits  (IC’s).  Several  researchers  have  reported  rap¬ 
id  thermal  annealing  of  GaAs  using  halogen  lamps,  graphite 
strip  heaters,  and  incandescent  lamps.'"'  We  have  previous¬ 
ly  reported  annealing  of  Be-,  Zn-,  and  Si-implanted  GaAs 
using  a  100-kW  water-walled  dc  argon  arc  lamp.'* 

The  characterixation  of  these  rapidly  annealed  implant¬ 
ed  layers  has  so  far  been  limited  to  depth  profiling  of  carrier 
concentration  and  mobility.  Although  metal  Schottky  field- 
effect  transistors  (MESFET's)  have  been  fabricated  by  rapid 
thermal  annealing,'  MESFET’s  are  majority-carrier  devices 
and  their  low-frequency  characteristics  depend  primarily  on 
mobility  and  carrier  concentration.  McLevige  et  al.^  have 
shown  that  while  good  electrical  activation  and  mobility  can 
be  obtained  upon  annealing  of  Be-implanted  GaAs  at 
600  *C,  the  integrated  photoluminescence  intensity  com¬ 
pared  to  unimplanted  samples  is  very  low  unless  30  min  of 
annealing  at  900  *C  is  carried  out.  Photoluminescence  inten¬ 
sity  is,  however,  only  indirectly  related  to  electrical  charac¬ 
teristics.  For  example,  if  we  assume  that  ion  implantation 
introduces  a  single  Shockley-Read-Hall  trap  level,  the  non- 
radiative  recombination  lifetime  r„,  will  be  inversely  propor¬ 
tional  to  the  trap  level  density  Nj-.  The  spontaneous  emission 
lifetime  which  relates  to  photoluminescence  intensity,  is 
given  by  (r,p)"'  =  (r,)"'  -I- {r„,)~',  where  r,  is  the  radiative 
recombination  lifetime.  On  the  other  hand,  the  reverse  leak¬ 
age  current  of  a  GaAs  or  Si  p-n  diode  is  directly  proportional 
to  the  trap  density  Ni-  in  the  space-charge  region.  Leakage 
current  /*,  forward  saturation  current  density  /y,  ideality 
factor  n,  and  maximum  electric  field  at  breakdown  are  gen¬ 
erally  accepted  as  being  parameters  that  are  very  sensitive  to 
the  residual  implant  damage  in  ion-implanted  p-n  diodes.'’"* 

“Pernianent  addrcK:  Musashino  Electrical  Communication  Laboratory, 

Nippon  Telegraph  and  Telephone  Public  Corporation.  Musa.ihino-shi. 

Tokyo,  188  Japan. 


In  this  letter  we  compare  the  characteristics  of  arc  lamp  an¬ 
nealed  Be-implanted  GaAs  pin  diodes  with  the  best  pub¬ 
lished  furnace-annealed  diodes.  To  our  knowledge,  detailed 
characterization  of  rapid-thermal-annealed  GaAs  pin  di¬ 
odes  has  not  been  previously  reported. 

Donnelly  et  at.*  have  fabricated  Be-implanted  diodes  in 
vapor  phase  epitaxy  (VPE)  GaAs  {No  —  N, 

=  3X  10''*  cm"').  They  obtained  an  average  electric  held  of 
1.5  X  lO*  V/cm  at  breakdown  and  observed  clear  Be  diffu¬ 
sion.  No  more  details  of  their  diode  characteristics  were  re¬ 
ported.  Helix  et  aH*  implanted  Be  into  VPE  GaAs 
(iVo  -iV^~3xl0'''cm-'|  at  250  keV  with  a  10'*  cm"* 
dose.  Annealing  was  done  at  900  *C  for  30  min  using  a  silicon 
nitride  cap.  They  obtained  an  ideality  factor  of  1.6  and  a 
saturation  current  density  of  1.56  x  10" ' '  A/cm*.  However, 
the  reverse  leakage  current  increased  exponentially  with  re¬ 
verse  voltage  having  a  value  of  19  nA  at  0.9  times  the  break¬ 
down  voltage  Vg  for  a  250-jjm-diam  diode.  Milano  et  al."' 
made  a  detailed  study  of  essentially  the  same  diode  structure 
as  reported  in  Ref.  9  in  both  VPE  and  liquid  phase  epita.\ially 
(LPE)  grown  material.  Diodes  made  on  VPE  material  had 
either  a  large  leakage  current  and  sofl  breakdown  or  were 
similar  to  the  ones  fabricated  in  Ref.  9.  Although  they  ob¬ 
tained  sharp  breakdown  and  low  leakage  current  for  diodes 
made  in  LPE  material,  they  observed  a  linear  increase  of 
reverse  leakage  current  with  reverse  voltage  and  deduced  an 
electron  lifetime  =  6  ps  in  the  Be-implanted  p  *  region.  Such 
a  short  electron  lifetime  is  an  indication  of  considerable  re¬ 
sidual  implant  damage  after  annealing. 

Be  ion  implantation  in  this  work  was  done  at  50  and  1 20 
keV  at  doses  of  4.4x  10'*  and  5.1  X  10'*  cm"*,  respectively. 
Implants  were  made  into  a  VPE  grown  undoped  (A'^  —  N., 
s;8x  10'*  cm"  ’)  GaAs  layer  with  a  nominal  thickness  of  14 
pm.  120  nm  of  SiO-  was  sputter-deposited  on  both  faces  and 
samples  were  annealed  for  10  s  with  maximum  temperatures 
of  950  and  1050  *C  following  the  time-temperature  cycles 
illustrated  in  Fig.  1.  The  temperature  plotted  in  Fig,  1  was 
obtained  from  an  optical  pyrometer  viewing  the  back  surface 
of  the  sample,  i.e.,  the  surfaces  not  facing  the  lamp.  The  Be 
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FIG.  I.  Time  (emperalure  cycles  used  in  (he  arc  lamp  annealing  of  the  bery- 
lium  implants  used  in  this  study.  The  two  curses  shown  corresponding  to 
different  lamp  currents  and  thus  to  different  peak  temperatures:  lal  9S0  *C 
and  ibl  1050  *C.  The  temperature  s-alue  is  the  reading  from  an  optical  pyro¬ 
meter  viewing  (he  back  of  (he  wafer. 

concentration  and  mobility  profiles  in  semi-insulating  un¬ 
doped  GaAs  that  was  implanted  following  the  same  implant 
schedule  as  that  used  for  the  diodes,  and  that  was  annealed 
following  cycle  A  in  Fig.  1  are  shown  in  Fig.  2  (Ref.  4). 


Depth  (/um) 


FIG.  2  RcHim-tcmperature  carrier  concentration  and  mobility  profiles 
measured  h>  differential  Hall  technique  on  Be-implanted  l4.4  x  I0''*cm~^ 
50keV  and  5. 1  ^  10'^  cm' 1 50  keV| GaAs  samples  arc  lamp  annealed  for 
lOstoflJO'C 
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FIG.  3.  Room-temperature  currenl-voliage  characteristics  of  three  differ¬ 
ent  diameter  GaAs  mesa  pin  diodes.  The  reverse  bias  characteristics  are 
shown  for  7S-.  250-.  and  SIX)  ^m  diam  mesas  (voltage  scale  at  bottomi:  the 
forward  characteristic  is  shown  for  the  500-/jm-diam  mesa  only  (voltage 
scale  at  topi. 

Contact  windows  were  opened  through  the  oxide  and 
Au/Zn/Au  was  electroplated  and  sintered  at  420  *C  to  form 
Ohmic  contact  to  the p"  regions."  Mesa  diodes  with  four 
different  diameters,  75,  125,  250,  and  500  pm,  were  fabri¬ 
cated.  The  junction  depth  in  the  VPE  diodes  was  measured 
to  be  1-1.1  pm  using  standard  cleaving  and  staining  with  a 
1 :1: 10  solution  of  HFiHjO^tH-O  for  1 5  s  with  intense  illumi¬ 
nation.'*  This  is  consistent  with  the  expected  profile  in  Fig. 
2,  and  indicates  that  no  damage  enhanced  diffusion  of  Be, 
which  is  usually  observed  with  furnace  annealing  of  10'*'- 
10'*  cm"*  dose  levels,*  '*  has  occurred. 

Capacitance  voltage  C-  Fmeasurements  on  these  diodes 
indicate  an  abrupt  junction  with  a  built-in  voltage  of  1.24  V. 
The  electron  carrier  concentration  deduced  from  these  mea¬ 
surements  varies  from  6.4x  10''*cm"*at  1.64pm  (zero  bias) 
to  8.5  X  10''' at  3.8  pm  from  the  metallurgical  junction  (  —  7 
V  bias).  The  epitaxial  layer  thickness  (including  the  p*  re¬ 
gion)  changes  from  14  to  15  pm  across  the  sample  (13x13 
mm).  Such  changes  in  thickness  and  doping  are  not  unex¬ 
pected  for  a  thick  'VPE-grown  layer. 

The  avalanche  breakdown  voltage  Vg  of  all  of  the  di¬ 
odes  was  200  ±  25  V  across  the  sample.  To  calculate  the 
punchthrough  voltage  and  maximum  electric  field  at  ava¬ 
lanche  breakdown,  we  assume  an  average  epilayer  thickness 
of  13.5  pm  (excluding  the  p*  region)  and  average  doping 
level  of  7.5X  10"  cm"*.  Thus,  the  punchthrough  voltage  is 
:=  1(X)  V.  The  maximum  electric  field  at  breakdown  (200  V) 
is  2.2  X  10*  V/cm  which  is  close  to  the  theoretically  predict¬ 
ed  value  of  2.8  X  10*  V/cm  deduced  from  Ref.  13  for  such  a 
doping  level. 

The  forward  and  reverse  characteristics  of  75-,  125-, 
250-,  and  500-pm-diam  diodes  annealed  for  10  s  at  maxi¬ 
mum  temperature  of  950  *C  (cycle  A  in  Fig.  1)  are  shown  in 
Fig.  3.  From  the  forward  bias  characteristics  of  a  500-pm- 
diam  diode  (0.75-1 .0  V)  we  calculate  an  ideality  factor  of  1 .6 
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Table  t.  Reverse  leakage  current  at  punchthrough  I  -  lOOViandO.'l  I',. 


Diode  diameter 
l/tmi 

Annealing 
cycle  (Fig.  11 

Leakage  current 
-  lOOV 

|Al  at 

0.9  I'a 

75 

A 

4.6  \  10-"’ 

l.bx  10-" 

125 

A 

1.1  X  10  " 

4.0x  10-" 

250 

A 

1.5x10-" 

4.8x  10-" 

500 

A 

5.8x10  " 

2.5x10-' 

500 

B 

1.3x10-" 

2.7x10-“ 

and  a  saturation  current  density  of  1.4 X  10~  "  A/cm^  The 
room-temperature  reverse  leakage  currents  of  four  different 
diameter  diodes  at  punchthrough  (  —  100  V)  and  0.9  f'g  are 
listed  in  Table  I.  The  temperature  dependence  of  the  reverse 
leakage  current /j,,  at  punchthrough  I  —  100  V),  and  the  ava¬ 
lanche  breakdown  voltage  Vg  have  also  been  measured  on  a 
typical  diode  and  are  found  to  be  given,  respectively,  by 

Ig  ccexpl  -£^/A.T), 

with  E,  =  0.354  eV  for  296  K  <  r<42l  K.  and 

Vg  =  137-f-  l.5(r-273)  V, 

where  k  is  the  Boltzman  constant  and  296  K<T<  395  K. 

These  diodes  are  superior  to  furnace-annealed  diodes  in 
several  aspects.  The  reverse  leakage  current  is  low  and  has  a 
normal  behavior.  No  enhanced  diffusion  of  Be  occurs  even  at 
a  dose  level  of  9.5 X  10''*  cm"'.  Furthermore,  the  forward 
saturation  current  density  and  ideality  factor  are  reasonable 
for  a  large  band-gap  material  like  GaAs  and  the  ma.ximum 
electric  field  at  breakdown  is  close  to  the  theoretically  pre¬ 
dicted  value. 

In  conclusion,  we  have  shown  the  potential  of  the  arc 
lamp  RTA  technique  in  the  fabrication  of  GaAs  pin  diodes, 
devices  which  are  sensitive  to  residual  implant  damage. 
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ABSTRACT 

The  waveform  bounding  approach  to  fast  timing 
analysis  of  MOS  VLSI  circuits  Is  discussed.  The 
idea  is  to  compute  rigorous  closed-form  expressions 
giving  upper  and  lower  bounds  for  transient  voltage 
waveforms,  rather  than  exact  values.  The  goal  Is 
to  enable  rapid  comoutation  without  sacrificing 
user  confidence  In  the  results. 

1.  Background  and  Objectives 

Existing  approaches  to  timing  analysis  and 
slffljlatlon  of  digital  integrated  circuits  fall, 
roughly  soeaking,  into  three  classes: 

1)  Methods  such  as  SPICE2  [1]  and  ASTAP  [2], 
based  on  essentially  exact  numerical  solution  of 
the  network's  differential  eouatlons,  are  accurate 
and  reliable.  But  even  with  the  increase  In  speed 
afforded  by  the  waveform  relaxation  method  [3], 
exact  numerical  solution  is  too  slow  for  the  needs 
of  the  VLSI  era. 

i1)  Specialized  MOS  timing  simulators  like 
HOTIS-C  [4]  and  SPLICE  [S]  rely  on  table  lookup  of 
device  characteristics  for  speed,  and  save  addition¬ 
al  time  by  terminating  a  Newton-Raphson  or  similar 
iteration  before  convergence  is  reached.  SPLICE 
Is  in  addition  a  mixed-mode  circuit,  timing  and 
logic  simulator  and  uses  a  selective  trace  algo- 
ritnm  to  exploit  latency.  In  both  these  programs 
the  termination  of  an  iterative  step  prior  to  con¬ 
vergence  saves  time  at  the  cost  of  accuracy  and,  in 
some  Instances,  of  numerical  stability  [6].  The 
iffiorovement  in  speed  over  SPICE2  Is  typically  one 
to  two  orders  of  magnitude  for  SPLICE  [5]  and  about 
two  orders  of  magnitude  for  MOTIS-C  [7J. 

ill)  More  recently,  some  researchers  are  ex¬ 
ploring  an  alternate  approach  to  timing  analysis 
and  simulation  based  on  a  radically  simplified 
electrical  description  of  the  network.  RSIM  [8], 
crystal  [9],  and  TV  [10,11]  fall  at  the  far  end  of 
the  speed-accuracy  tradeoff  curve  from  SPICE2.  A 
MOSFET  is  typically  represented  in  these  programs  by 
an  extremely  simplified  model:  a  linear  resistor  In 
series  with  a  switch.  And  a  polysilicon  or  dif¬ 
fusion  line  Is  represented  by  a  lumoed  capacitance 
In  RSIM,  or  by  a  delay  In  TV  obtained  by  simply 
averaging  the  upper  and  lower  delay  bounds  obtained 
by  P.uoinstein,  Penfield,  and  Horowitz  [12].  These 
programs  are  potentially  very  fast  and  have  a  num¬ 
ber  of  attractive  user-oriented  features.  The 
drawback,  of  course.  Is  that  there  are  no  absolute 
known  limits  to  the  error  in  their  total  delay 
estimates.  The  user  can  never  be  sure  the  answers 


they  give  are  close  enough. 

The  objective  of  the  waveform  bounding 
approach  to  timing  analysis  and  simulation  Is  to 
combine  the  computational  speed  that  results  from 
avoiding  the  numerical  solution  of  differential 
eouatlons  with  the  user  confidence  in  the  result 
that  comes  from  rigorous  uncertainty  bounds.  Our 
attack  on  the  timing  analysis  problem  is  based  on 
a  careful  fundamental  study  of  the  differential 
eouatlons  describing  the  dynamics  of  gates,  pass 
transistors.  Interconnect,  and  the  standard  digital 
circuits  constructed  from  them. 

In  addition  to  the  MIT  group  working  on  this 
project,  Mark  Horowitz  [12,13]  is  currently  com¬ 
pleting  a  dissertation  on  MOS  timing  analysis  at 
Stanford. 

II.  Response  Bounds  for  Interconnect 
2.1)  Linear  Interconnect  Models 

This  section  sunmarlres  the  results  obtained 
In  [12].  In  this  work  an  MOS  signal  distribution 
network  as  shown  in  Fig.  1  Is  modelled  as  a 
branched  linear  RC  line,  I.e. .  an  RC  tree,  as  In 
Fig.  2. 


Figure  1.  Typical  MOS  signal-distribution  network. 

The  Inverter  Is  shown  driving  three  gates. 


JL  X  ' 


Figure  2.  The  linear  RC  tree  shown  above  Is  a  model 
for  the  network  of  Fig.  1.  The  voltage 
source  Is  a  unit  step  at  time  t  ■  0. 

For  any  two  nodes  In  the  network.  Run  Is  defined  as 
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^tesum  of  the  resistances  along  the  route  consisting  of 
Me  intersection  of  tne  path  from  theinput  tonode  t  with 
llhe  path  from  the  input  to  node  m,  as  illustrated  in  Fig. 
3.  The  three  time 


Neot> 


Figure  3. 

constants 

Illustration  of  resistance  terms.  For  this 
network.  R^^  ■  Ri  +  R2.  Rkk  *  Ri  ♦  Rj  *  R3» 
and  Rii  ■  Ri  ♦  R2  ♦  Rs- 
used  to  derive  response  bounds  are 

(1) 

"^Di  *  1  \l  ‘•k 

(2) 

(3) 

where  the  stpmations  are  taken  over  a11  nodes  of  the 
network.  The  derivation  in  [12]  shows  that  v^{t)  ^ 
v<(t)  i  v-i(t).  for  all  t  >  0,  where  v^(t)  is  tne  actual 
zero  state  step  response  at  any  terminal  node  i,  and 
the  bounds  v{(t)  and  v^(t)  are  given  by 


v^(t)i 


0.  0  1  t  «  Tg^  -  T^^ 


1- 


■'’Oi 


'Ri 


•  "'^Di'^Ri  -  ^  "  ^P*^Ri 


(4) 


’;(t)*|]* 


1]-  if  «PC{VfRrt)/Tp].Tp-T^^  i  t 
~T^'  0  i  t  *  ‘'’oi’^Ri 

exp  C{Tg^-Tp^-t)/Tp^].  Tg^-Tp^  «.  t. 


(S) 


as  illustrated  in  Fig.  4. 


resistance  of  the  inverter,  the  nonlinear  gate-to- 
channel  capacitance  of  the  MOSFET  loads,  and  the  non¬ 
linear  capacitance  from  any  diffusion  line  to  substrate. 
This  section  describes  recent  work  [17-20]  that  allows 
the  bounds  for  linear  networks  [12]  to  be  applied  to 
RC  lines  incorporating  such  nonlinearities.  (Furtner 
research  is  needed  for  branched  lines,  i.e.  RC  trees.) 

Using  the  notation  and  sign  conventions  illustrated 
in  Fig.  5.  the 


Figure  S.  Two-capacitor  example  of  a  nonlinear,  non- 
uniform  RC  line. 

state  equations  for  any  nonuniform,  nonlinear  lumped 
RC  line  with  N  capacitors  can  be  written  in  the  form 


1  <  j  ^N,  (6) 

where  g^o  ^ntl  ^  capacitor  constitutive 

relations  q<,*  *'<(^1)  continuously  differentiable 
with  C<  (vj  *  hj'^(v'l)>  0  everywhere.  We  assume  the 
resistor  curves'^ are'^ continuously  differentiable,  strictly 
increasing,  and  pass  through  the  origin. 

Leitwa  1  [19] 

Consider  any  nonlinear,  nonunlfom  RC  line.  At  any 
instant  during  an  "up“  transition  (i.e.  e  ^  0,  e  ^  0) 
from  equilibrium, 

»j(t)  >  0,  vj(t)  «  e,  Vj(t)  <  v  (t) 
and  Vj(t)  i  0,  1  ^  j  N. 

Lenma  1  is  proved  in  [19].  Using  it,  we  give  a 
proof  in  [20]  of  the 

Honotone  Response  Theorem  for  Nonlinear.  Nonunifonr. 
RC  LinesT  Given  a  nonlinear  RC  line  as  descHbed  aoove. 
Suppose  that  (because  of  circuit  parameter  uncertainty, 
the  use  of  linearized  models  for  nonlinear  elements, 
replacing  the  exact  input  by  input  bounds,  etc..)  we 
do  one  or  more  of  the  following: 

a)  overestimate  the  input  e(t), 

b)  underestimate  one  or  more  R's, 


Figure  4.  Form  of  the  bounds  with  the  distance  from  the 
exact  solution  exaggerated  for  clarity. 

The  time  required  to  compute  these  bounds  grows  only 
linearly  with  the  numoer  of  elements  in  the  network. 
Recent  applications  of  this  result  include  [10,11.14,15 
16].  The  ultimate  goal  of  this  portion  of  the  project 
is  to  derive  a  hierarchy  of  such  bounds,  permitting  the 
user  to  trade  off  accuracy  for  computation  time. 

2.2)  Nonlinearities  Affecting  Interconnect 

The  linear  circuit  model  in  Fig.  2  fails  to 
incorporate  three  types  of  nonlinearities  present  in 
Fig.  1  or  related  circuits:  the  nonlinear  output 


c)  underestimate  one  or  more  C's. 

The  resulting  circuit  model  will  then  necessarily  over¬ 
estimate  the  output  vj(t)  at  each  instant  t  during  "up" 
fransTtTbns  (i.e.,  during  transitions  where  e  ^  0, 
e  >  0  throughout.) 

A  similar  result  holds  for  "down"  transitions  and 
estimate  errors  of  the  opposite  sign.  Using  part  a)  of 
the  assimiptions,  this  theorem  allows  us  to  computationally 
propagate  upper  and  lower  signal  bounds  through  the  network. 
Using  parts  b)  and  c),  it  allows  us  to  replace  a  nonlinear 
line  by  two  linear  ones,  one  strictly  faster  and  one 
strictly  slower,  to  which  the  linear  network  bounds  (4,5) 
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In  turn  apply.  We  have  not  yet  succeeded  In  finding  a 
generalization  of  this  result  that  will  apply  to  non¬ 
linear  RC  trees. 

III.  An  Approach  to  Waveform  Bounding 
for  MOS  Logic  iateT 

The  results  reported  here  apply  to  MOS  device  models 
of  the  form 

^0  ’  r  ''dB* 

where  O.S.S  and  B  refer  to  drain,  gate,  source  and  sub¬ 
strate,  respectively.  For  specificity  m  consider  only 
n-channel  devices  in  this  paper.  No  special  algebraic 
form  for  f  is  assumed,  only  that  f  is  continuously 
differentiable  and  satisfies  the  natural  monotonicity 
conditions 

#-1°.  (8) 

"''gb  “''ob  *''sb 

everywhere.  Thus  a  wide  variety  of  device  models  are 
allowed,  with  the  exception  that  (7)  does  not  allow  for 
snort-channel  effects. 

Our  approach  will  be  to  reduce  a  multiple-input 
logic  gate  by  steps  to  an  "equivalent  bounding  inverter" 
and  then  to  find  bounds  for  the  response  of  this  inverter. 

3.1)  Reduction  of  Series-Parallel  Transistor  Network 

tc  "iouivalent  Bounding  Transistor" 

We  have  developed  a  method  for  reducing  any  series- 
parallel  transistor  network  to  a  single  "equivalent 
bounding  transistor."  Using  the  technique  recursively, 
one  can  replace  the  pull up  or  pulldown  network  of  a 
multiple-input  gate  by  a  single  transistor  and  have 
rigorous  bounds  for  the  error  produced  by  this  simpli¬ 
fication. 

For  example,  a  parallel  connection  of  N  transistors, 
ill  identical  except  for  widths,  lengths  and  gate 
voltages,  satisfies 


j:i  h  ^<"G8.' 

where  Vrg  is  the  vector  of  gate  voltages.  We  have  proven 
that,  because  of  the,  assumptions  (8)j,  there  exist  Wgg,  ^ 
leo  independent  of  vgg,  and  and  VQg  that  depend  on  v/-» 
Such  that  {|)  can  be  replace3^by  the  simpler  bounds 


j^fh 


''SB'  i  i 


^^''68’  ''08'  ''SB^' 

for  all  vng  i  v'jB,  describing  a  single  transistor  with  a 
range  of  gate  voltages.  The  function  f  is  the  same 
throughout  (9)  and  (10).  Figure  6  illustrates  this 
process  for  N  »  2. 


"equivalent  bounding  transistor".  T-g  cost  of 
this  simplification  is  that  the  exact  value  of 
i  for  the  network  on  the  left  is  replaced  by  a 
range  of  values  in  ^he  simpler  model  correspond¬ 
ing  to  Vfig  ‘  VQB  <  vgg. 

3.21  Reducing  a  Multiple-lnout  Gate  to  an  "Eouivalent 
Bounding  inverter" 

A  gate  can  be  modelled  as  an  "equivalent  bounding 
inverter"  by  performing  the  reduction  outlined  in  section 
3.1  on  both  the  pullup  and  pulldown  networks,  reducing 
each  to  a  single  transistor.  Initial  trials,  comparing 
SPICE2  simulations  of  the  original  network  with  simulations 
of  the  "equivalent  bounding  inverter"  indicate  that  the 
resulting  bounds  for  !«„.  (vgut)  differ  from  the  exact 
values  by  only  about  *  ToX  for  practical  circuits. 

3.3)  Boundino  the  Response  of  an  Inverter  and  Load  to 
Input  Transitions 

When  applied  to  some  multiple-input  gates,  the  re¬ 
duction  procedure  described  in  the  previous  two  sub¬ 
sections  may  yield  an  inverter  in  which  the  pullup  gate 
is  externally  driven.  But  for  simplicity  we  consider 
here  only  the  case  of  a  standard  NMOS  depletion  -  load 
inverter  as  in  Fig.  7. 


'  '  IL 

I  cJMCttar 

L  •rwu*) 

Figure  7.  Depletion-load  inverter. 

To  bound  the  response  time  of  the  loaded  inverter 
we  need  simple  bounds  on  the  function  i.„*  (vguf  ^in)» 
which  is  the  difference  of  the  pullup  and  puHoown 
currents : 

*out^''out*  ''in^  *  Su^''out^  *  ^pd^''out’  ''in^’ 

Simple  linear  bounds  on  both  the  pullup  and  pulldown 
currents  are  shown  in  Fig.  8.  The  resulting  bounds 
for  the  output  curve  Iqu^  (vout^  deP«nd  on  vin(t). 


I _ l_l>i 

•ctMl  curvv 


Figure  6.  Replacing  a  parallel  transistor  network  by  an 


Figure  8.  Simple  linear  bounds  on  the  pullup  and  pull¬ 
down  currents.  The  latter  depend  on  v,„,  and 
hence  on  t.  " 

Initial  simulations  using  this  approach  indicate  that 
the  delay  bounds  for  these  simplified  models  differ  from 
the  delays  obtained  from  SPICE  simulations  by  about 
i  15X. 

IV.  Further  Work  in  Progress 

Much  work  remains  to  be  done  before  the  theoretical 
basis  for  the  waveform  bounding  approach  to  timing 
analysis  is  complete.  Among  the  larger  remaining 
problems  are: 

1.  extending  the  Penfield-Rubinstein  bounds  to  in¬ 
corporate  time-varvino  source  resistances,  such  as  those 
modelling  tne  pulldown  current  in  Fig.  8, 
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2.  finding  bounds  for  the  response  of  an  RC  tree 
ontaining  pass  transistors. 

3.  investigating  the  tolerance  in  the  bounds 
obtained  so  far  and  finding  tighter  ones  where  necessary, 
and 

4.  incorporating  effects  of  the  Miller  capacitance 
into  bounds. 
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VoiT  Bitof  Abstract 

Wt  sxainiao  tbo  probUm  of  rootiac  wiros  on  a  VLSI  ehip, 
whoro  tbo  pins  to  bo  eoaaaetad  ara  arraagod  ia  a  ragular  roetaa* 
(ttlar  array.  Wo  obtaia  tight  booads  for  tbo  t«anv.cato  ‘ehannol- 
aridth”  aaodod  to  roots  aa  n  X  n  array,  aad  dovolap  provabiy 
good  hoaristies  for  tho  goaaral  easo.  Aa  iatcrosting  ‘^oadiag 
algorithB*  for  obtaiaiag  iatogral  ^proziaiatioas  to  solotioas  of 
baoar  oqoatioas  is  osod  to  show  tbo  aoar-optiaiality  of  siagto-tiira 
roatiap  ia  tho  omsb-casa. 

ProbloiB  DoOoHIob 

Wb  OSS  a  classical  modal  wharaia  tha  chip  area  is  considered 
to  be  divided  iato  a  oaiform  n  X  a  array  of  square  etlU.  Each 
cellaeataias  p  pass  (ceaaoetioa  poiats  for  logic  demeBts).  Each 
iastaacc  of  our  routiag  problem  spodfios  a  coDectioo  of  nets 
vboro  each  aot  is  spadfiod  as  a  sat  of  pias.  (Each  pin  is  oa  at 
most  one  aot.)  Each  not  is  to  bo  eonaactad  together  by  boriiontal 
aad  vertical  wires.  Unlass  stated  otherwise,  we  assume  that  p  ^ 
1  aad  that  aacb  not  coaaaets  exactly  two  pias. 

A  global  routiag  problem  spadfles  a  pm  placemen^ 

so  that  tha  only  remaiaiag  work  is  to  route  the  wires  between  tha 
pias.  For  this  raasoa,  the  global  routiag  problem  is  a  special  case 
of  (aad  perhaps  aasiar  thaa)  the  gcaeral  placaaMat  aad  routiag 
problem  studied  ia  ]T79,  L30,  LSI,  LS2,  BLS3]. 

h  is  commoa  to  solve  a  global  routiag  problem  iastaaee  P  ia 
two  stapr 

(1)  coavuta  a  ffolof  iwettny  R  spodfyiag  for  each  net  the  set 
of  calls  aad  call  edges  to  be  traversed  by  the  wiring  for  that  act, 
aad 
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psfotMo  rut  i.44a«T.$Mts,  NSF  rut  mcs  ir-ossoc,  nsf  r«Dt  ECS 
•M0U4,  NSr  rut  MCS-SO-MSU,  NSr  srut  MCS-ll-OillT,  OARPA 
esotnet  NOOeie.SP'C.Wri,  Air  Forts  caairacl  OSR.t}-0S2S,  s  Bsatroll 
FMI— ship,  sea  as  IBM  CreCuls  Ftllnnkip. 


(2)  compute  a  dttaiUd  renting  that  spedfles  for  each  net  the 
exact  position  of  each  wire,  which  follows  the  previously  com¬ 
puted  global  routing  and  utisSu  the  usual  separation  constraints 
between  wires,  etc. 

In  this  paper  wo  are  eoacemad  exdusively  with  the  problem  of 
fiauiag  good  global  routings  (which  we  henceforth  call  routings). 

T-tura  Routiaxs 

We  are  partieulariy  coaceraed  with  t-tvrn  routings,  ia  which 
the  path  for  each  net  contains  at  most  i  tarns.  A  oae>tum 
touting  will  have  tor  each  wire  either  a  straight  wire  segment 
or  aa  T'-abaped  wiro-segmeat  The  aumber  of  turns  in  a  global 
routing  it  the  least  aumber  of  turns  in  aay  detailed  routiag 
consistent  with  the  global  routiag.  When  borisontal  aad  vertical 
wires  are  implemented  on  distinct  layers,  then  the  number  of 
turns  required  is  equal  to  the  number  of  Nias”  or  ‘contact  cuts* 
required  to  Join  the  straight-line  wire  segmenu  together,  la 
the  general  ease  (e.g.  when  p  >  1)  we  identify  the  number  of 
*tumt”  with  the  number  of  vies  required  to  implement  the  wiring 
pattern,  or  (equl  '’-ntly)  the  sum  for  each  net  of  the  number 
of  cells  for  which  the  giwoal  routing  for  that  act  crosses  both  a 
borisontal  aad  a  vertical  side  of  the  cell. 

NetatloBi  We  denote  the  set  of  global  routings  for  problem  in* 
staace  P  by  r(F).  The  set  of  t-tum  global  routings  are  denoted 
hyr.(F). 

Example 

Figure  1  presents  an  example  of  our  global  routing  problem 
on  a  4  X  4  grid  with  8  nets.  Figure  2  presents  a  typical  solution 
to  this  problem,  which  happens  to  be  in  ri(P). 


122 


i 


Column 

12  3  4 


6 

6 

7 

8 

5 

2 

1 

3 

7 

4 

1 

2 

4 

8 

5 

3 

Figiirc  1 


Column 

12  3  4 


6*- 

-•6 

f7 

•8 

•  5 

2«- 

• 

f 

t 

7*- 

r 

-«4 

1 

4 

8»- 

■W5 

■J  * 

I 

Figure 


2 


Chanad  WMth» 

lAl  P  denote  aa  iaeUnce  of  our  global  routing  problem  and 
let  P  denote  a  global  routing  aolving  P. 

Noution;  Let  w{R)  denote  the  maximum  number  of  urirea  pasa* 
tng  from  one  cell  into  an  adjacent  one  in  the  global  routing  R. 

Remarki;  Intuitivdy.  rii{R)  ia  the  'channel  width*  which  ia  needed 
to  route  the  wirea  of  the  lolution  i?,  ao  wo  call  «  the  *widtb*  of 
the  aolution  A.  The  one-turn  routing  R  of  Figure  2  haa  width  3 
(thare  are  three  wirea  between  cell  (2, 4]  and  cell  (3, 4)).  Flipping 
either  net  2  or  net  8  to  ita  other  *L”  configuration  trill  toduee  the 
width  to  2.  The  reader  can  convince  himaelf  that  no  one-tum 
routing  haa  width  one  by  conaidcring  note  1,6,  and  7. 

Definition;  An  opttmvm  global  routing  R  ia  one  that  minimiaea 
w(A}  over  all  global  routinga  for  the  given  problem  inatance  (i.o. 
overaU  A€r(/>)). 

Wotation;  fWidth  of  a  proUtm  maroacr  P.)  We  let  io(P}  denote 
oridth  of  an  optimal  routing  R  for  P. 

Notation;  fWidth  of  tho  boot  t-tum  rooltnp  for  P.)  We  let  «(t(P) 
denote  the  leaat  width  of  any  Mum  routing  R  that  aoivca  P. 

Notation;  fWorit-eoot  width  for  n  X  n  orroyaj  We  let  w(n) 
denote  the  maximum  width  of  any  problem  inatance  defined  on 
aa  n  X  n  array. 


Notation;  fRootrietion  to  t'ioni  ronlmfa.^  We  let  U)|(n)  denote 
the  maximum  of  wt(A)  for  any  problem  inatance  P  defined  on 
aa  n  X  n  array. 

Notation;  For  p  ^  1,  we  uae  the  aotatioaa  w{n,p)  or  i0i(n,p]. 

Romerki;  The  reader  will  be  able  to  diatinguiab  the  aetationi 
itf(A),  vi{P),  and  w(n)  by  the  tfpo  of  the  arguiiMnt. 

Motivation 

Our  raiaarch  waa  motivated  by  the  following  intriguing  eon* 
jaeture. 

Conjecture  {Tkompoon}:  w(n)  rm  wi(n)  ■■  [JJ  1. 

Thia  eontrovartiai>toundiag  conjoctare  atatea  that  in  the  woTot> 
eaie  we  need  only  eoaaider  one-tum  routinga. 

On  the  other  band,  it  ia  only  requiring  that  for  any  problem 
inatance  P  there  exiat  a  one-tare  routing  R  tor  P  aach  that 
^  "K**)  w(A)  ^  w(F).  (It  ia  not  difficult  to 

develop  problem  initancca  P  for  which  to(F)  1  but  uii(P)  s 
n(n).) 


*tMulta 

Our  major  theorema  are  hated  here;  proofa  and  proof  aketehea 
are  generally  given  later. 

Theorem  1.  <  <e(n)  ^  n. 

Ftooft  For  the  lower  bound  connect  (t,y)  to  (t,n  —  y)  for  1  ^ 
j  <  f ,  and  conaidcr  the  number  of  vrirea  that  muat  croia  from 
column  j  to  (JJ  -f  1.  For  the  upper  bound  aae  any  routing  in 
Ti(P)  tor  a  given  inatance  P.  | 

Tbeorem  2.  UJ  -f  1  ior(n)  ^  (JJ  -f  2. 

Furthermore,  a  one-tum  routing  R  with  w[R)  ^  I JJ  2  can  be 
computed  in  time  0(n*  log(n)). 

Remarkt;  Theorem  2  very  neatly  provea  part -of  Thompion'a 
conjecture.  We  do  not  know  how  to  reaolve  the  email  difference 
remaining  in  Theorem  2.  The  upper-bound  proof  invoivca  the 
development  of  an  degant  alprithm  of  independent  interest  for 
computing  a  pod  mtegrai  approximation  to  the  aolution  of  a  act 
of  linear  equalitiea.  The  following  theorem  atatea  the  main  reault 
uaed. 


Tbeorem  3.  I  Tht  Rounding  Theerem|  Let  A  be  a  real-valued  r  X  a 
matrix  and  let  A  be  a  positive  real  number  aach  that  ia  every 
column  of  A , 

(i)  the  sum  of  the  podtive  dements  ia  ^  A,  and 
(ri)  the  sum  of  the  aeptive  dements  ia  ^  —A. 

Let  X  be  aa  a-vactor  and  b  and  r-veetor  such  that  Ax  »  b.  Then 
there  exists  an  integrat  a-vector  i  such  that 

(i)  for  all  t,  1  ^  i  ^  a,  dther  i,  m  [z,j  or  fi,  »  (><1  O-*-  * 
ia  a  *rouadsd*  version  of  x  ). 

(ii)  AA  V  &,  where  S,  —  b,  ^  A  for  1  £  i  ^  r. 
Furthermore,  A  can  be  computed  from  A  ,  x  ,  and  b  in  time 

0(r*log(f)). 
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Rtinifki;  Tht  Rounding  Tboortm  iay»  thnt  wh«n  A  hu  only  n 
f«w  nonioro  atriM  in  onch  eolunn,  thtn  «t  enn  iffoethruiy 
(onnd  z  to  *  nonrby  itittfral  point  t  whilo  kotping  At  from 
ineruMing  yvy  much  ow  Az. 

Th«or«m  4.  It  U  NP>eoraplcte  to  dotcnnint,  given  an  intunea 
P  of  our  global  routing  problom,  vrhetber  W|(R)  ^  f  Jl  — ■  2- 

Ramarki;  Tbii  rmlt  it  perhapa  lurpriiing  in  viow  of  Thoorum  2; 
tbo  approzimation  algorithm  proaentod  there  ia  remarkably  good. 

The  reault  can  alao  be  improved,  although  we  do  not  include  the 
detaila  in  the  abatraet.  In  particular,  it  ia  alao  NP<oompIete  to 
determine  whether  wi(J*)  ^  fll  1-  When  p  it  even,  it  it  NP> 
complete  to  determine  whether  wi(P)  <  Given  the  remit 
proved  in  Theorem  S,  this  reault  ia  at  tight  at  potaible. 

Theorems.  When  nia even. ari(ft.pl» 

CerelUry.  J  $  a»)(n)  ^  fj]  + 1. 

Corellenf.  When  p  it  odd,  P  *  19J  ^  wi(n,p)  ^  f^]  +  p. 

Remarka;  Theorem  5  thowt  that  threo-tum  routinga  can  yield  an 
improvement  (by  one).  The  upper  bound  proof  uaea  an  elegant 
argument  baaed  on  finding  Eulerian  toura  in  an  aatociated  graph. 
Theorem  S.  There  it  a  polynomial  time  approzimatidn  algo> 
lithm  achieving 

w(/l)  ^  OW-logt;!^)) 

for  any  problem  inatance  P. 

Remarka;  The  proof  of  theorem  fi  invelvet  a  hierarchical  bottom* 
up  approach,  uaing  a  rocuraion  baaed  on  2  X  2  rabdiviaiena  We 
believe  it  it  potaible  to  reduce  the  logarithmic  term  to  a  conatant, 
but  have  not  yet  been  able  to  do  to.  The  reault  it  alto  valid  for 
P  eontaiaiag  multipoint  nett. 

Theorem  7.  If  n  E  2  (mod  4)  or  n  a  3(mod4),w(n)  ^ 

HJ  +  1- 

Remarka;  Thia  lower  bound  extenda  that  of  Theorem  2  to  handle 
routinga  having  arbitrarily  many  tuma,  in  the  caaea  indicated. 

Theorem  t.  wa(n)  ^  [}J  +  I. 

Remarka;  Thia  theorem  refinca  the  techniquea  and  reaulta  of  Theorem 
5  and  iu  firat  corollary,  moving  from  thrco>tum  to  tuvo-tum  nett 
and  improving  the  upper  bound  for  odd  n  by  one. 

Diaeuaaion  of  the  Model 

Chen  et.  oi  (CFKNSTT]  give  an  excellent  overview  of  how 
IBM  uaea  algoiithma  for  aolving  thia  global  routing  problem  to 
automatically  wire  iaafter>aliee  logic  arraya  for  their  Syatcm/370 
implementationa.  The  model  ia  particularly  appropriate  for  gate- 
array  taehnologiet  where  each  cell  might  contain  a  tingle  NAND 
gate.  Fabrication  turn-around  time  can  be  very  email  here  tinee 
wafert  can  be  preproceaaed  to  contain  the  array  of  gatea  and 


tho  only  procaaiBag  required  once  the  logie  deaign  ia  ««i«hvd  it 
to  produce  horiaontal  and  vertical  wiring  on  the  laat  two  metal 
layert,  to  cenneet  the  gatea  together  at  deaired.  However,  the 
preproceaaing  involved  utnally  fixea  an  upper  bound  on  the  value 
of  w{R)  that  will  be  allowed  -  if  all  routing  channelt  between 
gatea  have  width  20  then  the  routing  can  not  be  realiied  if 
w(R)>30.  . 

At  noted  earlier,  our  eencern  it  with  ‘Seont-caae*  valuta 
tv(n);  in  practice  one  would  expect  typical”  ehipa  to  have  w(P) 
aubatantialiy  leta  than  w(n). 

Related  Work  . 

Buritein  and  Pelavin  (BP83]  preaent  an  intereating  recent 
‘hierarchical*  approach  to  thia  global  routing  problem.  Much 
of  the  earlier  algorithmic  work  (c.g.  pIN83])  involved  variationa 
on  atandard  ahortaat-patb  algorithma,  uted  to  rouu  one  net 
at  a  time.  Some  probabilittic  models  have  been  developed  by 
El  Gamal  |EG81|  to  eatimate  v>{P)  under  varioua  ataumptiona 
about  the  average  distance  between  the  pins  on  a  net,  etc.,  in 
a  typical  instance  P.  Johnson  (J82]  gives  an  overview  of  the 
NP-eompleteneu  meulta  known  in  this  area. 


Proof  of  Theorem  2;  One-Tum  Routinz  bv  Reundlne 


Theorem  2.  (f  J  +  1  $  •!(*»)  $  (fj  +  2. 


Prooft  For  n  b  2,  the  lower  bound  example  is  easily  constructed. 
(For  example,  see  Figure  3.)  For  larger  values  of  n,  simply  embed 
the  3-by-2  example  ia  a  *width-3  eroas’  of  O-tum  vertical  and 
horiaontal  aeta. 

The  upper  bound  it  proved  using  the  Rounding  Theorem.  We 
first  describe  how  to  apply  the  Rounding  Theorem  to  our  routing 
problem,  and  then  ia  the  next  section  describe  a  turprisingly 
efiScient  *rounding  algorithm”. 

We  esaume  for  convenience  here  that  n  it  even.  Let  x,  be  a 
0—1  valued  variable  esse-  ated  with  net  s  indicating  which  of  the 
two  ^shaped  routes  will  be  used.  The  interpretation  ia  fixed  but 
arbiirary.  We  assume  here  that  aaeh  L-sbaped  route  has  esaetly 
two  wire  segments.  If  both  pins  for  a  net  lie  ia  the  same  row  or 
column  we  asaume  the  two  L-thaped  routes  arc  distinguished  by 
the  inclusion  of  different  aero-length  wire  tegmenta  at  their  ends. 
(Tbcae  arc  degenerate  L-shapes  with  one  leg  of  the  L  having  aero 
length.)  Each  amignment  of  0  —  1  values  to  z  b  (zt, .  ..,z.>/j) 
placet  an  easily  computed  number  of  wire  segments  in  each  row 
and  column.  For  example,  in  the  problem  of  Figure  3  the  number 
of  wire  segmeaU  in  column  1  is  (I  —  zi)  +  zj. 
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Fiaure  3 

It  is  then  ampie  to  write  e  set  of  eqtintiont  spedfyin(  thnt 
euh  row  and  column  vrill  contain  exactly  )  wire  scements: 

Ax.h  {.) 

where  A  is  a  (2  X  n)  X  (^)  real>valued  matrix.  Each  variable 
X,  will  participate  in  at  most  four  constraints,  since  its  two  L> 
routes  affect  the  wire  scemcht  count  in  at  most  two  rows  and  two 
columns.  Furthermore,  it  is  aaqr  to  check  that  A  satisfies  the 
conditions  of  the  roundinc  theorem  with  A  «  2,  since  each  x, 
will  enter  two  constraints  positivdy  and  two  negativdy.  Finally, 
it  is  easy  to  see  that  the  vector  x  (1/2, 1/2, 1/2,. ...1/2) 
satisfies  the  equation  («],  ance  each  net  endpoint  will  then  add 
1/2  to  the  wire  setment  count  for  its  row  and  column. 

Applying  the  rounding  theorem,  we  infer  the  existence  of  a 
0  —  1  valued  vector  i  such  that 


A*  ^b  +  (2,2,...,2) 


.  Except  for  the  claims  regarding  running  times,  this  proves 
Theorem  2.  | 


Proof  of  Theorem  3;  The  Roundinx  Alxerithm 


The  execution  time  of  our  rounding  algorithm  is  0(r*  log(e)). 
In  our  routing  application,  we  have  r  «  2n  (one  equality  for  each 
raw  or  column)  and  a  *  ^  (one  variable  for  each  net),  ao  the 
execution  time  is  0(n*  log(n)).  This  compares  favorably  with  the 
more  usual  approach'  based  on  shortest  paths,  which  runs  in  time 
0(n*)  to  route  an  n  X  n  array. 

The  steps  of  our  rounding  algorithm  are: 

Step  1.  fConvert  to  0—1  problem.]  Replace  z  by  x  —  x*,  where 
x(  ao  [x,J  for  all  t.  Replace  b  by  b  —  Ax'.  Solve  the  modified 
problem  (steps  2  to  3)  and  then  convert  back  by  adding  x*  to  the 
i  computed  and  Ax'  to  the  f  computed.  Halt. 

Step  2.  [Fast  rtduetion  m  tht  namier  of  variables.)  This  step 
reduces  the  number  of  variables  to  ^  r  by  0(log(s)] 
through  steps  2a  —  if. 

3a.  [Test  if  dene.)  If  s  £  r,  go  to  step  3. 

3b.  [Croupinp.]  Divide  the  a  variables  into  (r  +  1)  groups, 
each  of  roughly  the  same  number  of  variables.  Consider  a  new 
problem  Cy  »  b  where  y  is  an  (r+l)>vaetor  having  one  element 
for  each  group,  and  C  is  r  X  (r  -f  1)  matrix.  C  and  y  are  obtained 
from  A  and  x  by  adding  the  constraint  that  the  within  each 
group  each  variable  will  havu  the  same  value.  For  example,  the 
first  column  of  C  it  the  sum  of  the  columns  of  A  coiresponding 
to  variables  in  the  first  group,  and  yi  is  the  sum  of  the  x,*s  from 
the  first  group. 

3e.  (Reduce  C  to  row-echelon  form.]  Using  elementary  row 
operations,  convert  the  r  X  (r  + 1)  matrix  C  to  row-echelon  form, 
as  in  Figure  4,  (if  C  has  rank  r).  Note  that  this  operation  does 
not  change  the  null  space  of  C  . 


Figure  4 


Theorem  3.  I  The  Romding  Theorem]  Let  A  be  a  real- valued  r  X  a 
matrix  and  let  A  be  a  positive  real  number  such  that  in  every 
column  of  A , 

(t)  the  sum  of  the  positive  elements  is  ^  A,  and 
(tt)  the  sum  of  the  negative  riements  is  ^  — A. 

Let  X  be  an  s-vector  and  b  and  r-vector  such  that  Ax  v  b.  Then 
there  exists  an  mttgrol  s-vector  t  such  that 

(t)  for  all  t,  1  ^  i  ^  s,  either  fi,  «  [x,J  or  1,  —  fx,)  (l.e.  t 
is  a  ‘'raunded*  version  of  x  ]. 

(it)  Ai  B  fi,  where  S,  —  b,  £  A  for  1  ^  i  ^  r. 
Furthermore,  i  can  be  computed  from  A  ,  x  ,  and  b  in  time 
0(r>Iog(f)). 

Freof;  We  new  describe  a  "rounding  algorithm”  that  efficiently 
computes  the  vector  i  whose  existence  is  assured  by  the  Round¬ 
ing  Theorem.  The  input  and  output  parameters  are  as  described 
in  that  theorem: 

Input.-  A  ,  A,  b  ,  X  . 

Output;  fi 


2d.  (Round.)  Let  s  be  an  r  -f  1-veetor  in  the  null  space  of  C. 
(This  is  easy  to  compute  given  step  2e.)  Let  X*  «  min{X  ^  0  ) 
y  -f  X  X  shas  an  integral  component)  and  let  - 

w  «*  y  -f  X*  X  s 

3e.  (Update.)  For  each  variable  x,  in  a  group  j  where  Wj 
is  integral,  fix  1,  at  Wj  and  remove  z,  from  the  problem  (set 
b  K  b  —  w,  •  A|,,|,  where  A|,,)  is  the  i-th  column  of  A  ,  delete  x, 
from  z  and  delete  the  i-th  column  of  A  .)  Set  the  remaining  x,'s 
to  thrir  group  Values  w^'s. 

2f.  (Revise  group  structure.)  If  now  s  ^  r,  go  to  step  3.  Else 
split  the  largest  group  into  two  smaller  ones,  update  C  to  rcfiect 
the  changes  in  steps  3d  and  2e,  and  return  to  step  2d. 

Step  8.  (One  by  one  reduction  m  eyvolities  and  variables.)  If  s  £ 
r  execute  step  3a,  else  execute  step  3b.  Repeat  step  3  until  all 
variablu  have  been  fixed.  Then  halt;  the  desired  solution  has 
been  found. 
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te.  I*  ^  »:  BlimmtU  an  «««aKtv.]  Find  ta  t  tueh  that  tht 
•limisatieB  of  aqiMlity  t  will  sot  olTeet  the  ftaol  roiult  (it  cos  b« 
pravoa  that  inch  os  t  ahvayt  esiiU  is  thii  eoie).  EUmiaatc  this 
row  ftoBi  matiiz  A  and  from  h  . 

tb.  (Siminate  a  iMmble.I  Thii  it  msch  at  is  itep  2,  cseept 
w«  may  only  ^misatt  oae  variable;  here  each  variable  it  in  ita 
own  (reap. 

This  eempletet  our  deteriptien  of  the  reusdisg  algorithm.  It 
it  set  too  difficult  to  verify  the  claimed  rassiag  tithe.  | 

Pieof  of  Theorem  4;  NP-Comoletct  of  Optimal  One- Turn  Routtsa 

Theorem  4.  It  ia  NP-camploto  to  determise,  pvea  an  inatanco 
i*  of  our  ^bal  routing  prohltm,  whether  wi(F)  ^  fSl  ** 

Pteoft  The  leduction  is  from  }>SAT.  Given  an  inatanee  £  of  3> 

SAT  with  variables  . . Zy  and  clauaot  ei,ca,...,e«,  set 

n  s  14m  <f  3  and  define  the  routing  problem  P  as  follows. 

Pint  ia  the  rightmost  7m  4-  3  columns  of  the  grid  arc  not 
iaelndad  ia  any  net.  Any  l>tani  routing  of  P  can  thus  have  row 
widths  at  mott  Tm  n  f — 2.  Therefore,  we  are  only  eoneemod 
with  calumn  widths  ia  what  foUowt. 

Pint  in  the  leftmoit  7m  columai  but  not  in  the  middle  7  rows 
arc  paired  to  that  the  pin  |n  the  ith  raw  of  the  yth  column  is 
linked  to  the  pin  ia  the  (n— t-f-  l)tt  row  of  the  yth  column.  Each 
of  thoN  nets  mast  be  routed  os  a  vertical  wire,  and  the  question 
of  whether  t0i(P)  ^  ffl  '**  3  1*  equivalent  to  the  question  of 
vriiether  the  middle  7  rows  can  be  routed  with  column  width  2. 

The  middle  7  rows  of  the  leflmott  m  columns  arc  used  to 
represent  the  clautet  (one  column  for  each  clause).  The  middle  7 
rows  of  the  next  6m  colusmt  ore  used  to  represent  the  variables 
(2r,  coittfflst  for  variable  z,  where  r,  is  the  number  of  time* 
appears  in  £].  The  columns  used  to  represent  c,  and  z,  arc 
shown  in  Figure  S.  The  order  of  the  columns  from  left  to  right 
is  arbitrary. 
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Flqure  5 

If  the  bth  term  in  elause  e,  (fe  *  1,2  or  3]  is  s„  then  any 
oae  of  the  -f  symbols  in  the  2r,  columns  for  z,  is  replaced  by 
Cfk'  If  the  kth  term  ia  elause  Cy  is  s;.  then  any  one  of  the  -> 


symbols  ia  the  3r,  eolumns  for  Zi  is  leploead  by  e,e.  Since  z, 
appears  r,  times  in  £,  there  ore  always  enough  4-'s  and  —'a  for 
all  the  e,k*.  The  remaining  -f' s  and  — 's  (as  wtil  as  the  dots) 
are  not  assigned  to  a  net.  In  what  follows,  we  shew  that  the 
middle  7  rows  eon  be  routed  with  column  width  2  if  and  only  if 
£  is  sstisfiable. 

Clearly  the  nets  labeled  with  d,k’s  must  be  routed  as  vertical 
wires.  This  leaves  only  two  ways  to  route  the  nets  labeled  with 
Otk’s  and  bu'*-  Tbs  two  routings  correspond  in  a  natural  way  to 
the  truth  value  of  the  associated  variable  z,.  The  routings  ore 
shown  in  Figure  6. 
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Figure  6 

It  remains  to  route  the  e,k's.  It  is  easily  shown  that  if  e,* 
corresponds  to  z,  where  z,  luu  s  true  routing  or  to  z;  where  z, 
bos  a  false  rout:'},  'ben  net  e,s  can  be  safely  routed  without 
.  using  a  vertical  wire  segment  ia  the  column  for  e,.  This  is  not 
the  cose  if  e,t  corresponds  to  z,  where  z,  has  a  false  muting  or 
to  z;  where  z,  has  a  true  muting.  In  the  latter  coses,  the  net 
for  e,u  must  include  a  vertical  wim  segment  ia  the  eoluma  for  c^ 
that  passes  through  the  top  of  the  cell  containing  e^i.  Hence  the 
middle  7  rows  can  be  routed  eritb  column  width  2  if  and  only  if 
them  is  a  k  for  each  j  such  that  Cjk  corresponds  to  z,  where  z, 
has  a  true  routing  or  to  z;  where  z,  has  a  false  routing.  This 
condition  is  equivalent  to  £  being  sstisfiable.  | 
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Proof  »f  Th«»ma  S;  Routint  wring  Eoltffap  Totut 


f 


c 


Thoorwn  S.  Whta  p  U  tran,  W|(n,p) « 

Prooft  Lot  ooch  of  the  eelle  of  the  n  x  n  orray  be  the  mticce  of 
a  pap^f  eonscct  any  two  vcrticei  that  are  eonneetod  by  a 
not. 

Since  p  ie  even,  every  vertex  will  have  an  evu  defrec. 

Thus  the  edfee  can  be  oifaniied  into  a  directed  path  which 
ia  an  Enleriao  tovr,  traveninc  each  edge  exactly  once.  (The  cate 
that  the  graph  ia  not  connected  ariaee  but  ia  aaay  to  handle...) 

For  each  edge  (t,j)  -*  (h,/)  of  the  Eulerian  tour,  we  uac  the 
an  l^ahapod  mute  with  the  horiaontaJ  arc  Snt: 

Since  each  vertex  win  have  p/3  horitontal  area  leaving  it  and  p/3 
vertical  area  entering  it,  we  can  route  the  entire  chip  with  pn/3 
traeka  ia  each  row  or  column. 

To  prove  the  Snt  corollary  (p  «  1),  we  group  the  eella  into 
3X3  aquarea  and  apply  the  above  eonatruction  for  p  xi  4. 

Then  we  may  need  to  introduce  email  (length  1)  joga  within 
each  aquare  to  get  the  two,  hoiiaontal  area  leaving  on  different 
towi.  Thia  we  can  do  with  only  one  extra  track  for  each  row  or 
column,  yieldiag  fn/3]  + 1  traeka  at  moat.  Here  each  L-ahaped 
route  may  have  a  little  tail  at  each  cad  ao  a  net  may  have  three 
tuna  total.  | 

.  Proof  of  Theorem  8:  Provablv  Good  Routing 

Theorem  8.  There  ia  a  polynomial  time  approximation  alg^ 
rithm  achieving 

ur(ff)  S  0{t.(F).log(^)) 


for  any  problem  inatance  P, 

Proof;  (Sketchl;  Let  eut(F)  denote  the  maximum,  over  all  inb- 
aquaru  of  the  rt  X  n  array,  of  the  number  of  ncta  which  muat 
Croat  the  border  of  the  aquare,  divided  by  the  perimeter  of  that 
aquare.  It  ia  eaay  to  tee  that  eut(F)  it  a  lower  bound  on  w(P). 

Divide  the  chip  into  aquarea  whoee  aidet  have  length  X  as 
eut{P)lp.  Route  thcae  aquarea  independently,  in  an  arbitrary  !• 
tarn  manner  in  width  at  moat  0(eut(P)),  routing  neta  that  mutt 
leave  a  aquare  arbitrarily  to  a  point  on  the  perimeter  of  that 
aquare.  Then  proceed  through  n/X  levelt  of  bottom-up  recurtion, 
at  each  levd  patting  together  four  aquarea  from  the  previoua  level 
ia  a  3  X  3  pattern,  and  uaing  at  moat  0(eu((P])  additional  width 
to  route  all  neta  that  leave  the  newly  conatructad  aquare  to  the 
perimeter  of  that  aquare.  | 

.  Proof  of  Theorem  7t  Improved  Lower  Bound 
Theorem  T.  If  n  ar  3  (mod  4)  or  n  a  3  (mod  4),  i0(n)  ^ 

Prooft  Let  /  wm  [|J  and  e  f)].  Conaidcr  dividing  the  chip  at 
ahown  ia  Figure  7  into  four  quadranta  A,  B,  C,  D,  where  A  ia 
/  X  /,  B  and  C  are  /  X  c,  and  D  it  e  X  e. 


A 

B 

c 

D 

■ 

Figure  7 

Contider  a  problem  inatance  where  each  pin  of  A  ia  to  be 
connected  to  a  eorreaponding  pin  in  D,  and  each  pin  ia  B  ia 
connected  with  a  pin  in  C.  (If  e  >  /,  the  remaining  pina  in  D  can 
be  left  unattached,  or  paired  off.)  Since  pl|  ia  odd,  at  Icaat 
\}^li\  of  the  wirea  from  A  mutt  run  through  ^-(without  lou  of 
generality  -  the  eaae  for  C  ia  aymmetrie).  Thua  the  perimeter  of 
B  will  be  eroaaed  at  leaat  (/*  + 1)  -f  /e  timea:  (/*  1)  tuna  for 

the  A-D  neta  and  }t  timea  for  the  B-C  neta.  Since  the  perimeter 
^  B  ia  eroaaed  by  only  /+ e  channtia  (rowa  or  columna),  at  leaat 
one  of  thcae  channela  mnat  contain  at  leaat 


wirea.  | 

Proof  of  Theorem  3;  Coed  Two-Turn  Reotinxa 
Theorem  g.  wj(n)  $  (fj  +  X. 

Proof;  Thia  ia  aimilar  to  the  proof  of  the  Srat  corollary  to  Theorem 
5,  except  that  we  group  the  eella  regularly  into  1X3  reetanglea 
inetead  of  3  X  3  aquarea.  The  Eulerian  theorem  ia  applied  m  be¬ 
fore.  Finally,  the  L-ahaped  routinga  obtained  will  have  to  have 
at  moat  one  tail  added  to  produce  the  Bnal  routing.  When  n  ia 
even  it  ia  aaay  to  arrange  the  taila  without  inereaaing  the  number 
of  traeka  required  per  channel  by  more  than  one.  When  n  ia  odd 
the  argument  ia  a  little  more  delicate.  Conaider  iabdling  each 
pin  cither  "H”  or  *V*  according  to  whether  the  route  determined 
by  the  Eulerian  tour  would  conacct  to  that  pin  with  a  horitontal 
or  vertical  aegment.  Figure  8  ahowt  a  labelliag  that  might  reault 
for  a  problem  with  n  »  7. 
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Figur*  8 
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Wa  aia  gnaraataad  tlut  aaeh  1x3  raetaasle  eentaiu  oaa 
7  aad  ea«  *V”  ^  tba  aae  af  tfct  Eularian  taur,  and  wa  naad 
to  fuanataa  that  aach  nw  baa  at  Bwit  (, JJ  + 1  ■H*a  and  that 
aaeb  eelvma  baa  at  moat  (f  j  1  *V”t.  Tbe  rawi  an  alnady 
OK,  if  tbc  tiliac  pattern  ii  like  that  of  Fipire  S.  To  adjuat  tbe 
columaa  we  note  that  by  ranniag  a  abort  tail  within  a  roetaatie 
we  can  dfeetively  owve  a  V  *en  top  oP  iu  ncicbboring  H.  We 

can  do  tbia  aafaly  only  in  rowi  wbicb  have  a  *V*  in  tbe  lightBoat 

eoloBu;  etberwiie  tbe  tail  might  increaae  tbe  reqoirod  ebaand 
width.  However,  there  arc  [JJ  ia  tbc  rightiiMet  eoluiaa,  to  we 
caa  alwayi  move  u  many  aa  [)J  Va  out  of  aay  column  into  a 
neighboring  one.  That  we  can  uae  tbe  taiia  to  guaraatoe  that  no 
column  will  have  mon  than  |,)J  + 1  Ve.  | 

Open  Probleme 

We  pinaeat  here  tome  open  proUemt  related  to  tbe  above 
teaulta.  (We  hope  to  be  able  to  aaawcr  tome  of  them  ia  our 

Open  Problem  1;  It  then  a  coactaat  e  and  a  polyaomial>timo 
global  routing  algorithm  A  aaeb  that  A  will  produce  for  aay 
problem  inataace  P  a  muting  H  with  w(fl)  ^  c  •  w(P)  (Le.  a 
routing  whotc  width  ia  within  a  conataat  factor  of  optimal)? 

OESLSreUemJ^  What  u'^  for  aay  fixed  t?  la  then  a  flzed 
t  for  uAkb  tbia  ratio  oquali  1  for  aD  n? 

Otien  PteMem  8;  Caa  tbe  logarithmic  factor  ia  tbc  running  *»««* 
of  tbe  Rounding  Algorithm  be  riimiutod? 

Open  Fmblem  4:  What  an  other  appUcatioiu  of  tbe  Rounding 
Algorithm?  (We  do  know  of  tome  ways  of  applying  tbe  algorithm 
for  global  routing  applieatieiu  that  an  awn  general  than  tbe 
tochniquea  given  ia  tbia  abetract  We  lutpeet  that  tbc  algorithm 
may  have  a  large  number  of  uaeful  applicatiooa.) 

Open  Problem  5;  Let  curfPl  bededaed  a«  in  th«  proof  of  Tbeoreic 
6.  It  tben  a  conataat  c  eueb  that  v;(P)  £  e  •  ettt(P)  for  all  prob¬ 
lem  inataaeca  P?  (Note:  we  can  prove  that  a  rimilar  meaaun 
computing  wirwleagtb  within  aubequarei  ii  linearly  related  to 
tbe  cut  meeaun.) 

Open  Problem  6:  Can  tbc  additive  **+p”  term  be  improved  in 
tbe  fccoad  corollary  to  theorem  5? 

Open  Problem  T:  Devdop,  empirically  or  otberwin,  a  good  modd 
of  tbe  wiring  problem  initancei  tbat  ariic  in  practice. 
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